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ABSTRACT 

This paper describes STECMAP (STEllar Content via Maximum A Posteriori), a 
flexible, non-parametric inversion method for the interpretation of the integrated light 
spectra of galaxies, based on synthetic spectra of single stellar populations (SSPs). We 
focus on the recovery of a galaxy's star formation history and stellar age-metallicity 
relation. We use the high resolution SSPs produced by Pegase-HR to quantify the 
informational content of the wavelength range AA = 4000 - 6800 A. Regularization of 
the inversion is achieved by requiring that the solutions are relatively smooth functions 
of age. The smoothness parameter is set automatically via generalized cross validation. 

A detailed investigation of the properties of the corresponding simplified linear 
problem is performed using singular value decomposition. It turns out to be a pow- 
erful tool for explaining and predicting the behaviour of the inversion, and may help 
designing SSP models in the future. We provide means of quantifying the fundamen- 
tal limitations of the problem considering the intrinsic properties of the single stellar 
populations in the spectral range of interest, as well as the noise in these models and 
in the data. We demonstrate that the information relative to the stellar content is 
relatively evenly distributed within the optical spectrum. We show that one should 
not attempt to recover more than about 8 characteristic episodes in the star formation 
history from the wavelength domain we consider. STECMAP preserves optimal (in 
the cross validation sense) freedom in the characterization of these episodes for each 
spectrum. 

We performed a systematic simulation campaign and found that, when the time 
elapsed between two bursts of star formation is larger than 0.8 dex, the properties 
of each episode can be constrained with a precision of 0.04 dex in age and 0.02 dex 
in metallicity from high quality data (R — 10 000, signal-to-noise ratio SNR = 100 
per pixel), not taking model errors into account. We also found that the spectral 
resolution has little effect on population separation provided low and high resolution 
experiments are performed with the same SNR per A. However, higher spectral res- 
olution does improve the accuracy of metallicity and age estimates in double burst 
separation experiments. When the fluxes of the data are properly calibrated, extinc- 
tion can be estimated; otherwise the continuum can be discarded or used to estimate 
flux correction factors. 

The described methods and error estimates will be useful in the design and in the 
analysis of extragalactic spectroscopic surveys. 
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1 INTRODUCTION halos, bulges, disks and disk patterns, is still controversial. 

Empirical constraints on the formation scenarii are engraved 

ml ,. „ , , , r i ■ -u in the distribution of stellar ages, metallicities, and kinemat- 

1 he diversity ot shapes and colors ot galaxies illustrates the , , 

, , . , , . ,. . ,, , , ics. Unless the galaxies can be resolved into stars, this crucial 

wealth ol physical mechanisms acting m these complex ob- . 



jects. Their formation history, including the building of their 



information must be extracted from integrated spectra. This 
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spectral energy distribution is a recording of the whole life of 
a galaxy: the condition of its birth, the formation and assem- 
bly of its first blocks, its passive evolution and the recycling 
of its material, or its active evolution through merging, all 
these determine the current stellar content. Yet, this infor- 
mation is embedded in a non-trivial manner in the light we 
receive. 

While a wealth of such data is currently being gathered 
from spectroscopic surveys (for example the Sloan Digital 
Sky Survey or the 2DF Galaxy Redshift Survey), using them 
to probe the general properties of the stellar populations on 
a cosmological timescale is an exciting perspective. 

In the literature, the stellar content of a galaxy is of- 
ten characterized by a luminosity weighted age, a luminos- 
ity weighted metallicity, a global velocity dispersion , and a 
parameter characterizing extinction. Since IWorthevI l)l994l . 
the Lick indices have been readily used in order to describe 
the nature of the stellar populations. Spectral indices are 
convenient because they are robust to a number of observa- 
tional perturbations, but they exploit only small wavelength 
domains. The use of a larger fraction, and eventually of all 
the information in a spectrum must, at least in principle, 
help separate, age-date and characterize coexisting stellar 
components, steps required to access the actual evolution of 
the galaxies under study. Individual spectral features with 
specific sensitivities to age or metallicity may add informa- 
tion to the Lick data points, and the redundancy provided 
by many lines spread over a wide spectral range reduces the 
sensitivity to noise. Recently, methods have emerged that 
use t he whole available spe ctral range, relying on compres- 
sion lR eichardt_etall2Q ( < )l^ or on non-negative least squares 
ijMateu et al.ll200ll : ICid Fernandes et alJl200-jh . 

The introduction of these methods gave birth to a field 
of research, whose goal is to measure the cosmic star for- 
mation history by summing the individual star formation 
histories of a large number of galaxies. This results in an 
estimate of the mean history of star formation (a so called 
"Madau plot") in principle free from the uncertain ties re- 
lated to pure emission-line diagnostics (|DoDital200.'tl . More- 
over, the distribution of individual star formation histo- 
ries is even more constraining than a Madau plot alone. 
If feasible, this approach indeed constitutes a very power- 
ful test for the current cosmological models. In fact, such 
techniques have been used recently to support the idea of 
galactic downsizing, i.e. to argue the stellar activity has 
shifted in the recent past towards less massive galaxies, 
something that some authors have presented as a prob- 
lem for hierarchical clustering. As more results of this kind 
are published, it becomes clear that different authors have 
very different conceptions of wh at is a reasonable interpre- 
tation of a galactic spectrum l|Cid Fernandes et all I200H 
iHeavens et"alll2004 ). Indeed, the problem of characterizing 
star formation histories based on a spectrum is strongly ill- 
condi tioned as we will demonstrate extensively b elow (see 
also lMoultaka fc PelatfeOOfilMoultaka et alJl2004h . This re- 
mains true in the restrictive framework of evolutionary pop- 
ulation synthesis, although this approach incorporates the 
simplifying assumption that the intrinsic spectra of mono- 
metallic, single-aged single stellar populations (SSPs) are 
known. Over-interpretation of the data is a common pitfall 
when ill-conditioning is misjudged or overlooked. A useful 
approach to ill-conditioned inverse problems is the maxi- 



mum penalized likelihood, which is formally equivalent to a 
maximum a posteriori likelihood (MAP) . It has been applied 
in the past in a variety of fields in astro nomy such as light 
deproj ection (Kochanck & Rvbicki 199 9), stellar kinematics 
JSaha fc Williamsl Il994l : [Merrittl Il997t Pichon fc Jhiebaud 
1998), image deblurring (Thi ebautll2002l : lThiebautH200RF ' or 
the inter pretation of low res olution energy distributions of 
galaxies llVergelv et aljl200^l . 

This paper discusses the interpretation of high resolu- 
tion optical spectra of galaxies. A maximum resolving power 
R = 10 000 is considered, which is adequate in particular for 
the studies of low mass galaxies or of massive star clusters in 
galaxy cores. We focus on the object's stellar content. The 
simultaneous extraction of the kinematical information with 
a direct extension of the adopted method is the subject of a 
companion paper. Our work is positioned at the interface be- 
tween single stellar population models and observations. Its 
purpose is not to question the particular ingredients and as- 
sumptions of a specific population synthesis code, although 
some of the discuss ion will be specif i c to t he model pack- 
age Pegase-HR of iLe Borgne et, all l)2004t) . since it is the 
first pack age to have provided a si milar spectral resolution 
(see lGonzalez Deleado et alj ([2005) for a medium resolution 
package) . Rather, we intend to clarify how the intrinsic prop- 
erties of a basis of single stellar population spectra can be 
used to infer consequences for the study of composite stellar 
populations. 

The general problem, where additional constraints such 
as positivity of the star formation history are included is 
a non-linear problem. Nevertheless, we give special impor- 
tance to the linear problem because it provides firm footing 
to explain the processes that determine the reliability of a re- 
covered star formation history. It also clearly displays many 
of the features found in the more realistic inversions as well. 

We also study the feasibility of the inversion in different 
observational regimes (in terms of spectral resolution and 
noise), and give simple scaling laws and error estimates to 
predict the accuracy and relevance of the results. The main 
characteristics of our approach are: 

• It is non parametric, and thus provides properties such 
as the stellar age distribution with minimal constraints on 
their shape. 

• The ill-conditioning of the problem is taken into account 
through explicit regularization. 

• Optimal interpretation of the data is achieved by the 
proper setting of the smoothing parameter. 

The organization of the paper is as follows. We start 
in Sect. [2] by describing the inversion problems that will be 
tackled. In Sect. [3J we provide a comprehensive investiga- 
tion of the idealized linear problem of finding the stellar 
age distribution of a mono-metallic, reddening-free stellar 
population. Sect, ^investigates the performance of these in- 
versions in a set of simulations in terms of resolution and 
separability of bursts. Sect. addresses the problem of the 
simultaneous study of stellar ages and metallicities, while 
allowing for extinction (or other transformations of the con- 
tinuum). Conclusions are drawn in Sect.HjJ while the paper 
closes with a discussion for prospects. 
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2 NON PARAMETRIC MODELS OF SPECTRA 

The spectral energy distribution (SED) that we measure for 
each spatial pixel of an observed galaxy results from light 
emitted by coexisting stellar populations of various ages, 
metallicities and kinematics, and from the interactions of the 
stellar light with the interstellar medium (reddening, nebu- 
lar emission). The example of the Milky Way tells us that 
any given stellar population of a galaxy may consist of stars 
with non-trivial dist ributions in age, metallicity, or even 
relative abundances zing et alTl200lt iProchaska et alJ 
l2000l:lGratton et alfe OOO). In principle, age, abundances and 
velocity distributions should thus be treated as independent 
parameters in a galaxy model meant for an exploration with- 
out preconceptions. 

In the following, we will restrict ourselves to simpli- 
fied models that balance, in our view, technical feasibility 
(in view of current models and data) and scientific interest. 
We assume that metallicity describes the stellar abundances, 
mainly because our populatio n synthesis model doe s not al- 
low for abundance variations (iThomas et al] (I2003T) specifi- 
cally address this issue) . Except for the discussion of a more 
general case in Sect.|^| we restrict ourselves to the assump- 
tion of a one to one relationship between stellar ages and 
metallicities. This allows us to search for significant trends, 
as predicted by simple evolutionary scenarii for galaxies. We 
adopt a simple parameterized formulation for extinction. Fi- 
nally, we will deal with stellar populations at rest (or with 
known velocity distributions). 

Emission lines are out of the aim of this study. They 
may be used in the future, in particular to obtain further 
constraints on the youngest stars and on obscuration by 
dust, or to constrain properties of the interstellar medium. 

2.1 The spectral basis 

The basic building block to model the spectrum of an ob- 
served galaxy is the spectral energy distribution S(X, m, t, Z) 
of a star of initial mass m, age t and metallicity Z (mass 
fraction of metals at the formation of the star). Integrating 
over stellar masses yields the intrinsic spectrum B°(X,t, Z) 
of the single stellar population of age t, metallicity Z and 
unit mass: 

fM m „ 

B°(X,t,Z) = TMF(m)S(X,m,t,Z)dm, (1) 

JM min 

where IMF(M) is the Initial Mass Function and M m i n and 
-M max are the lower and upper mass cut-offs of this distribu- 
tion. Assuming that the metallicities of the stars can be de- 
scribed by a single-valued Age-Metallicity Relation (AMR) 
Z(t), it is possible to derive the unobscured spectral energy 
distribution of the galaxy at rest: 

F rcst (A) = j*^ SFR(t)B°(X,t,Z(t))dt, (2) 

where SFR(t) is the Star Formation Rate (i.e. mass of new 
stars born per unit of time, with the convention that t = 
is today) and i max is an upper integration limit, for instance 
the Hubble time. Similarly, £ m i n is a lower integration limit, 
ideally 0. Both t m i n and t max must in practice be set accord- 
ing to the validity domain of the SSP basis B°(X,t, Z(t)). 
The Luminosity Weighted Stellar Age Distribution 



(LWSAD) A(t) gives the contribution to the total emitted 
light of stars of age [t, t + dt]. It is related to the SFR by: 

m A SFR^) £™ B o t> m) dA ( (3) 

where A A = A max — A m i n is the width of the available wave- 
length domain. In order to use the luminosity weighted stel- 
lar age distribution, we define the flux-normalized single 
stellar population basis B(X, t, Z) where each spectrum is 
normalized to a unitary flux: 

B[X,t,Z) = *2 X ' t,Z) • W 

AA A mi „ B °^ Z ^ 

Using A(t), B(X, t, Z) and Z(t), the unobscured spectral en- 
ergy distribution of any composite population at rest reads: 

Frest (A) = ^ A(t) B(X, t, Z{t)) At . (5) 

For a given single stellar population basis, dealing with 
the star formation rate or the luminosity weighted stellar 
age distribution is apparently equivalent. Yet, because of the 
strong dependence of the mass-to-light ratio of single stellar 
population fluxes on time, A(t) is more directly related to 
observable quantities than SFR(t). We therefore prefer the 
formulation based on A (see also Sect. 14. Ot . 

Many codes are available to construct B(X,t,Z). The 
single stellar population library adopted her e is com- 
puted with Pegase-HR ]Le Borgne et al] l2004l. a version 
of Pegase 1 that provides optical spectra at high resolu- 
tion (R = 10 000), based on the ELODIE stellar library 
JPrugniel fc Soubirarj200ll) . It consists of single stellar pop- 
ulations generated by single instantaneous starbursts with a 
set of metallicities ZZ = [0.0001, 0.1]. The wavelength range 
of the spectra is AA = [4000A, 6800A], sampled in SX = 0.2 
A steps. Fig. shows example spectra of such single stel- 
lar populations, at fixed metallicity (panel a) and fixed age 
(panel b). The large number of lines is supposed to improve 
the accur acy of stellar content analysis. The IMF used is de- 
scribed in lKroupa et al] lll995t) and the stellar masses range 
from 0.1 M Q to 120 M . The IMF is an input of Pegase- 
HR, that we do not attempt to constrain. On the contrary, 
we assume it is universal and known a priori. The generated 
spectra are con sidered most reliable fr om tmin = lOMyr to 
t max = 20Gyr ]Le Borgne et al.ll2004l) . The spectra of the 
different single stellar populations are computed for a set St 
of logarithmically spaced ages between t m i n and t max . The 
set of mono-metallic single stellar populations obtained is 
referred to as the basis or kernel in the rest of the paper. 

2.2 Extinction models 

In most cases, the intrinsic emission of the stars of a galaxy 
is affected by dust. Both the composition and the spatial 
distribution of the dust determine the extinction. The ISM 
of galaxies is rarely homogeneous, and the stars may be 

1 Projet d'Etude des GAlaxies par Synthase Evolutive, 
http : //www . iap . f r/pegase 
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(a) Solar metallicity, SSP of age 50, 400, 2500, 15000 Myr 
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Figure 1. Example of high resolution single stellar populations produced by Pegase-HR. Panel a: solar metallicity single stellar 
populations of age 50, 400, 2 500, and 15000 Myr (from top to bottom). Panel b: 1 Gyr single stellar population for several metallicities, 
Z= 0.05, 0.02, 0.004 and 0.0004 (from top to bottom). The spectra are normalized to a common mean flux and offset for clarity. 



STEllar Content via Maximum A Posteriori 5 



seen through different amounts of dust. One could there- 
fore envisage an age-dependent extinction law or extinction 
parameter. Indeed, there is evidence that the obscuration 
of an ensemble of stars varies systematically with age over 
the first ~ 10 7 years of their evolution, while these young 
stars leave or destroy their parent molecular clouds (Char- 
lot & Fall, 2000, and references therein). However, the early 
epochs relevant to starbursts are currently slightly out of 
reach with Pegase-HR, although they will b ecome accessi- 
ble wi th improvements of the stellar library. IVergelv et al.l 
( 2002) suggest that recovering such a trend with age is possi- 
ble with high quality data ranging from the ultraviolet to the 
infrared. In this paper, we deliberately chose not to search 
for an age-dependence of extinction. The main reason is that 
we are considering only a limited section of the electromag- 
netic spectrum. We postpone a systematic study to future 
work. In the following, we adopt a unique extinction law 
/ext(-E, A) parameterized by the color excess E = E(B — V) 
and normalized to have a unit mean. Accounting for extinc- 
tion, the model spectral energy distribution then reads: 

F r8 st(A) = /ext(£,A) j t A(t)B(\,t,Z(t))dt. (6) 

Note that / ext can be a function of more than one time- 
independent parameter, and may for example be a more 
complex attenuation law, function of the distribution of dust 
in the galaxy and its mixing with the stars, or a low order 
polynomial accounting for the instrumental spectrophoto- 
metric calibration error. 



2.3 General properties and problems with SSPs 

Synthetic spectra of single stellar populations are the build- 
ing blocks involved in the interpretation of galaxy spectra. 
Their properties have a strong effect on the behaviour of the 
inversion problem. 

Both the theory of stellar evolution and observations 
tell us that single stellar population evolution with time is 
fundamentally smooth in the optical except for a number of 
specific evolutionary transitions (e.g. helium flash, carbon 
flash, supernova explosion, envelope expulsion at the end of 
the Asymptotic Giant Branch), and that it shows some lin- 
earity. This means, for instance, that a 500 Myr old popula- 
tion looks very similar to the average between a 600 Myr and 
400 Myr old one. Our ability to identify the differences de- 
pends strongly on the signal-to-noise ratio (hereafter SNR) 
of the models and data. Section shows how to quantify 
this quasi-linearity and its consequences. 

The synthetic spectra of single stellar populations are 
affected by uncertainties in the stellar evolutionary tracks 
and in the stellar library used to construct them. Despite 
permanent progress, some aspects of stellar evolution remain 
difficult to model (e.g. the Horizontal Branch, the Asymp- 
totic Giant Branch, the Red Supergiant phase; effects of 
convection, of rotation, of a binary companion). The errors 
propagate to the SSPs, resulting in unknown systematic er- 
rors in age and metallicity estimates. Some insight to the 
amplitude of these errors is given by the direct comparison 
between results obtained using different sets of tracks. Nev- 
ertheless, it is beyond the scope of this paper to discuss the 
pros and cons of the different set of tracks and the reader 



is refe red to lCharlot et al.l lll996l) and lLeieune fc Fernandesl 
(2002) for an extensive discussion. 

The input library of stellar spectra can be either em- 
pirical or theoretical. The latter situation has the advantage 
of providing spectra for any parameter set (T, g, Z) with no 
observational noise. However, these are not free of intrin- 
sic uncertainties, due for instance to shortcomings of atomic 
and molecular data, to assumptions on partial thermody- 
namical equilibrium, or to inappropriate abundance ratios. 
Empirical spectra, on the other hand, are hampered by a 
number of issues: 

(i) The library is discrete. Therefore interpolation be- 
tween existing stars is needed. This can be a tricky issue, 
especially on the borders of the grid and in underpopulated 
regions of (T, g, Z) space. Moreover, when stars are interpo- 
lated, the noise patterns are also carried along. We will see 
in Sect. l3.4l that this has noticeable effects on the behaviour 
of the inverse problem. 

(ii) The library generally consists only of Milky Way or 
even Solar Neighbourhood stars. Thus, the solar metallic- 
ity is the best populated region of parameter space, while 
other regions may be depleted, especially for extreme cases 
as young metal poor or old metal rich stars. We also know 
that outer galaxies may involve abundance ratios that are 
not found within the Milky Way. One example is found in 
the metal-rich and a-enhanced populations of large ellipti- 
cal galaxies. This difficulty is known as template mismatch 
and results in biases that would be best studied using sim- 
ulations based on theoretical spectra with various sets of 
abundances. The library used in Pegase-HR is known to 
be deficient in high metallicity, hi gh a-element abundance 
red giants ihe Borgne et alJbOO^ . which may lead to an 
over-estimate of age or metallicity in observed galaxies 2 . 

(iii) Empirical stellar spectra have a finite SNR, and so 
do the averaged or interpolated spectra involved in the syn- 
thesis of a galaxy spectrum. It should then be considered 
useless to observe stellar populations at SNR's larger than 
the library's. 

(iv) The fundamental parameters of each star in the li- 
brary are estimates, in the case of Pegase-HR based on 
a subset of stan dards and the automated code TGMET 
jKatz et alJll998fi . Even though error bars on these param- 
eters are provided, some glitches and outliers happen. The 
final error resulting from interpolating between correct and 
ill-parametered stars and summing is unknown. 

Notwithstanding the above limitations of spectral synthesis, 
our purpose here is to investigate the behaviour of the in- 
verse method for a given model. Hence, in this paper we will 
be restricted to one given SSP model. 



3 A SIMPLIFIED INVERSE PROBLEM: THE 
AGE DISTRIBUTION RECOVERY 

This section discusses the inverse problem of recovering the 
age distribution of a purely mono-metallic unobscured pop- 
ulation at rest. This simplification is deliberate and yields a 



2 Work is being done to improve the underlying library. 
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linear relationship between the observed spectral energy dis- 
tribution F rcst (A) and the stellar age distribution A(t). It al- 
lows us to address its fundamental properties and behaviour, 
characterized by simple quantities and criteria. These turn 
out to be precious tools in the process of understanding and 
diagnosing the ill-conditioning and pathological behaviour of 
such a problem and their non linear generalization. It also 
allows us to introduce the automated regularizing method 
required to solve the problem in practice. 

3.1 The linear inverse problem 

Our idealized mono-metallic unobscured model stellar popu- 
lation is characterized by its luminosity weighted stellar age 
distribution A(i) and its constant age-metallicity relation 
Z(t) — Zo, the spectral energy distribution of the emitted 
light -F r est(A) then reads: 

F rcst (A) = A(t)B(X,t,Z(t))dt, (7) 

where B(X,t.Z(t)) is the flux-normalized single stellar pop- 
ulation basis (cf. Eq. ||1J) which is just a function of the 
wavelength and time as the AMR Z(t) is supposed to be 
known. Solving Eq. Q where B(X,t,Z(t)), and i ? rost (A) are 
given and A(t) is the unknown, is as we will demon strate, a 
classi cal example of a potentially ill-posed problem jHansenl 
1994), i.e. it can be shown that small perturbations of the 
data can cause large perturbations of the solution. Hence any 
noise in the data, F lest (X), or in the kernel, B(X, t, Z(t)), can 
lead to a solution very far from the true solution. 

3.2 Discretization: the matrix form 

Intuitively, after discretization of the wavelength and age 
ranges, the linear integral equation can be approximated 
by: 

n 

st^^BijXj , i€{l,..,m}, (8) 

j=i 

with: 

Si = (-Frost (A)) AgAAi , 

Bilj = (B(X,t,Z(t))) xeAK teAt] , ( 9 ) 
ay = <A(i)) t€At . , 

where the notation, e.g., (-Fr es t(A)}AeAA i indicates some kind 
of weighted averaging or sampling of the argument -F res t(A) 
over the i-th wavelength interval AAi and similarly for the 
age interval. 

More rigorously, let {gi : [A min ,A max ] i-> R; i = 
l,...,m} and {hj : [t mia ,t m e^] i-» R;j = l,...,n} be 
two ortho-normalized bases of functions spanning the wave- 
length and age intervals respectively. Then the best approx- 
imation 3 of A(t) writes: 

A(t) ~ ^2 Xj hj (t) , with Xj = / A(t) hj (t) dt , (10) 

3=1 •* 

3 In the sense of the I2 norm defined by the ortho-normalized 
basis of functions. 



similarly, the best approximation of -Frest(A) writes: 

m „ 

F ICBt (X) « ^ Sj ffj(A) , with Si = F ICBt (X) gi(X) dX . (11) 

It is straightforward to obtain the coefficients of the matrix 
B in Eq. JHJ by inserting these approximations in Eq. Q: 

Bi,i = jj B{\t,Z{t))g i {X)h j {t)AtdX. (12) 

In practice, we adopt equally spaced Xi and equally 
spaced log(t,) to sample the wavelength range and the evo- 
lutionary timescales of single stellar populations. Then we 
simply use gate functions for gi and hj. In other words, Si is 
the average flux received in Xi ± <5A and Xj is the mean flux 
contribution of the sub-population of age \tj-i,tj] - hence 
the notation used in Eqs.|U] 

Note that if tj — tj-i is too large, significantly differ- 
ent populations are already entangled in the sampled basis 
Bj(X) = (B(X, t, •^(^))) tgAt • F° r this reason the number 
n of single stellar population elements in the basis should 
not be too small. The signatures of the populations of each 
age should be expressed in the adopted basis. On the other 
hand (see Sect. 13.41 . we will sometimes want to use a small 
n, i.e. a basis that is coarser in time, and we will see that the 
overall adopted value strongly depends on the observational 
context (SNR, spectral resolution and range . . . ). 

Using matrix notation and accounting for data noise, 
the observed SED reads: 

y = B-x + e, (13) 

where y = (yi, . . . ,y m ) T is the observed spectrum (includ- 
ing errors), i.e. yi is the measured flux in the range Xi ± SX, 
and e = (ei,...,e m ) T accounts for modelling errors and 
noise. The vector of sought parameter x is the discretized 
stellar age distribution, i.e. the Xj is the luminosity contri- 
bution of the stars of age [tj-i,tj] to the total luminosity, 
averaged over the available wavelengths. The vector s = B x 
is the model of the observed spectrum and B is the discrete 
model matrix, sometimes also referred to as the kernel. 

3.3 Maximum a Posteriori 

In a real astrophysical situation, the data y is always con- 
taminated by errors and noise. Following Bayes' theorem, 
the a posteriori conditional probability density / pos t(x|y) 
for the realization x given the data y writes: 

/post(xjy) OC /data(y|x) /prior (x) , (14) 

where / pr ior(x) is the a priori probability density of the pa- 
rameters, and /data(y|x), sometimes referred as the likeli- 
hood, is the probability density of the data given the model. 
For Gaussian noise, /data(y|x) tx exp[— \ x 2 ( v l x )]i with: 

X 2 (y|x) = (y - s(x)) T -W- (y - s(x)) , (15) 

where the weight matrix is the inverse of the covariance ma- 
trix of the noise: W = Cov(e) -1 . Maximizing the posterior 
probability ill H is equivalent to minimizing the penalty: 

Q(X) = X 2 (y|x) - 2 log (/prior(x)) . (16) 

Without a priori information about the sought parameters, 
the probability density / pr ior is uniformly distributed and 
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this term can be dropped. In this case, Q(x) simplifies to 
X 2 (y|x), the traditional goodness of fit estimator for Gaus- 
sian noise. 

When the errors are uncorrelated the matrix W for- 
mally assigns a weight 1/Var(i/;) to each pixel i of data. 
Practically, one may want to modify the variance-covariance 
matrix in order to use it as a mask. For example, a dead pixel 
can be assigned null weight. In the same way, we may also 
mask emission lines. Because of this particular usage of the 
matrix W, it will often be called the weight matrix. It need 
not be exactly a variance-covariance matrix, even though it 
can be built upon one. 



3.4 Ill-conditioning and noise amplification 

As mentioned earlier, the linear problem corresponding to 
the recovery of the stellar age distribution x by maxi- 
mizing the likelihood term only, qualifies as a discrete ill- 
conditioned problem, i.e. it might therefore be extremely 
sensitive to noise, both in the data and in the kernel. It thus 
will require some form of regularization in order to obtain 
physically meaningful solutions. 



Signal and noise singular coefficients 
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3.4-1 Noisy data 

First, let us see how ill-conditioning arises, in the case of 
a noiseless kernel but with noisy data. We solve for x by 
maximizing the likelihood of the data y given the model; 
this is the same as minimizing: 



X (y|x) = (y-B-x) ■ W (y-B-x) 



(17) 



with respect to x. The solution is the weighted least squares 
one: 



x ML = (B 1 ■ W ■ B) B W y . 



(18) 



For sake of simplicity, we will consider stationary noise in 
this section. The results of this section however apply for 
non- stationary noise by replacing the model matrix B by 
KB and the data vector y by Ky where K is the Choleski 
decomposition of the weight matrix, i.e. W = K T K. For 
stationary noise, the weight matrix factorizes out: 



X 2 (y|x) oc (y-B-x) 



B x) 



(19) 



and the maximum-likelihood solution becomes the ordinary 
least squares one: 



XML 



fB T B) 



B 



y • 



(20) 



In order to clarify the process of noise amplification, we in- 
troduce the singular value decomposition of B as: 



B = U S V 



(21) 



where £ = diag(<ri, 02, ■ ■ ■ , cr n ) is a diagonal matrix carry- 
ing the singular values, sorted in decreasing order, of B on 
its diagonal. U contains the orthonormal data singular vec- 
tors u; (data-size vectors), and V contains the orthonormal 
solution singular vectors Vj (solution-size vectors). Replac- 
ing B by its singular value decomposition in Eq. 12011 yields: 



XML 



V- S" 



u T -y = £ 



(22) 



Figure 2. The decay of the singular values of the kernel (crosses) 
is the origin of the bad behaviour of the problem, through the am- 
plification of the last singular vectors. In this example, the data 
y is perturbed by Gaussian noise of constant SNR^ = 100 per 
pixel. The unperturbed singular coefficients (white squares) de- 
cay, while the noise singular coefficients (black diamonds) remain 
spread around l/SNRj for any i (we chose (y) = 1 in this ex- 
ample). The perturbed singular coefficients • y are thus noise 
dominated as soon as i 7 — 9, and so are the terms of the 
SVD solution (Eq. 1221 1. The increasing difference between the 
true and noise singular coefficients is worsened by the division by 
smaller <r;. The solution x is dominated by the last few solution 
singular vectors, and its norm is purely noise dependent. 



The solution is obtained as the sum of n solution singular- 
vectors Vj times the scalar uj ■ y/o~i. For real data, we have 
y = y + e, where the noiseless data y is related to the true 
parameter vector x via y = B ■ x. Instead of x, the solution 
recovered from the noisy data reads: 



XML 



X + x c 



(23) 



Thus, we recover the true unperturbed solution x plus a 
perturbation, x e , related to the noise. Comparing x and 
x e is equivalent to comparing the unperturbed singular co- 
efficients ■ y and the noise singular coefficients • e. 
Figure |21 shows an example with 40 logarithmical age bins 
from 10 Myr to 20 Gyr, and where the data is perturbed by 
Gaussian noise and has constant SNRd = 100 per pixel (the 
subscript d stands for data). The figure shows that the sin- 
gular values decay very fast and span a large range, giving 
a conditioning number, defined by CN = ai/a n fa 10 8 char- 
acteristic of an ill-conditioned problem. Note that B is the 
flux-normalized SSP basis defined by Eq. 0, i.e. each spec- 
trum of the basis has unitary flux, and the x% are thus flux 
fractions and not mass fractions (see Sect. l4.Ql for more de- 
tails) . The noise singular coefficients remain rather constant 
for any rank i. Indeed, uj ■ e involves a normalized vector 
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times noise, and has a constant statistical expected value of 
(y)/SNRd. On the contrary, the unperturbed singular coef- 
ficients decay. In this example, the model x is a Gaussian 
centered on 1 Gyr, and we find that changing the mean age 
of the model does not significantly affect the decay of the 
Ui ■ y (see Appendix [XJ. We can thus define two regimes, 
with a transition for io as 7 — 9 in this example: 

• For i ^ io we have uj ■ y ~ uj ■ y and the singular 
coefficients and modes are set by the unperturbed signal y. 

• For i > io we have • y ~ uj ■ e ~ (y)/SNRd. The 
singular coefficients are set by the noise in the data and 
saturate. 

The division by decreasing Oi makes the high rank terms in 
x e become very large. The solution x is thus dominated 
by the last few Vi. Its norm is several orders of magni- 
tude larger than the true solution. We see that, for such 
ill-conditioned problems, pure maximum-likelihood estima- 
tion results in huge noise amplification and useless solutions. 

The origin of ill-conditioning is, in most part, physi- 
cal: it lies in the evolution of the single stellar populations, 
which is dictated by stellar physics and the relevant stellar 
evolution models. One aspect of the situation is illustrated 
in Fig. |21 It shows a map of the \ 2 distances between the 
spectra (i.e. columns) of the kernel B, for different SNRs. In 
this figure, the time interval [50Myr, I5Gyr] was arbitrarily 
divided in 40 logarithmic age bins, and the SSP basis is flux 
normalized as in Eq. J1J . It shows that for low SNRs (of or- 
der 10), one element of the basis can not be quantitatively 
distinguished from its neighbours within a typical log age 
interval of ~0.5 dex. It also makes it clear that the loga- 
rithmic age-resolution of any inversion method will not be 
constant all over the time range. 

3.4-2 Noisy correlated kernel 

As discussed in Sect. 12.31 the models which are constructed 
from observed spectra, are also contaminated by observa- 
tional noise. Let us investigate the expected signature and 
basic properties of a noisy kernel. 

PEGASE-HR single stellar populations have a noise 
component estimated to SNRb ~ 200 per 0.2 A pixel (the 
subscript b stand s for basis). F rom theoretical studies of 
random matrices trlansen|ll988t) . it is known that a hypo- 
thetical noiseless single stellar population basis perturbated 
by adding white noise of root mean square ao should have 
its singular values settle around i/mao, where m is the 
number of samples in the observed spectral energy distribu- 
tion. If the spectra are normalized to unitary flux we have 
ao — 1/SNRb. Figure 21 shows the singular values of the 
flux-normalized kernel B (thick line). The singular values 
clearly do not settle around the value expected for m ~ 10 4 , 
i.e. w 1 for SNRb = 100 (dash-dotted line) and « 0.1 for 
SNRb = 1000 (dash-double-dotted line). On the contrary, 
their decay is typical of an ill-conditioned noiseless kernel, 
as if the single stellar populations involved had infinite SNR. 
Let us investigate some details of the synthesis process, in 
an attempt to explain this unexpected property. 

As every single stellar population is actually the 
weighted sum of p single stars from the library, the noise 
level of the synthetic spectral energy distribution should be 
lower (typically divided by *Jp). However, the singular values 
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Figure 3. Distance map of the spectral energy distributions in- 
volved in the flux-normalized kernel B . The contours enclose a do- 
main where the ith spectrum can not be distinguised against the 
j'th at a 90% confidence level. The solid contour is for SNR = 100 
per pixel and the dash-dotted one is for SNR = 10 per pixel. It 
is not possible to unambiguously disentangle two spectra in such 
regions, i.e. the resolution in age of any inversion method can not 
be finer than the width of these regions (which is read on the 
axis), and it is not constant all along the age range. This reso- 
lution in age in data space has a counter part in the resolution 
defined in Sect.lPI 



of the kernel plus white noise at a level SNR = 1 000 (cor- 
responding to summing p — 100 stars having SNR = 100) 
are still much larger than the initial kernel's singular values. 
Having more stars available would lower the saturation level, 
but one would need 10 10 stars with SNR = 100 to make the 
saturation vanish. 

In order to test for the effect of wavelength resampling 
of the individual stellar spectra, we added SNR = 100 per 
pixel smoothed noise (i.e. noise with a correlation between 
neighboring wavelengths) to the kernel. The corresponding 
singular values are very similar to the former white noise 
case, except that they settle to a slightly smaller value. They 
still saturate high above the singular values of the initial 
kernel. 

In contrast, when the added noise pattern is correlated 
in the direction of ages instead of wavelength, one obtains 
a non-saturated singular value spectrum very similar to the 
initial kernel, even with SNR as low as 100 (a larger SNR 
would make it look even more similar). 

Indeed, such correlated noise arises in part in the ker- 
nel because individual stellar spectra are interpolated in 
(T, g, Z) space. 

A single spectrum from the input stellar library can 
thus significantly contribute to several ages. For instance, 
the same limited number of red giants will be used (with 
slightly different weights) to represent the red giant branch 
stars over a range of ages and metallicities. Their noise pat- 
terns will show up in several consecutive synthetic single 
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spectral signature of models noise 
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Figure 4. Investigation of the noise signatures of the kernel. For 
comparison, the kernel was noised in several different ways: with 
white noise, oversampled noise and finally noise correlated in the 
age direction of the kernel, each type of noise producing char- 
acteristic features in the singular values. The expected spectral 
signature of the noise in the initial basis (saturation of the singular 
values) does not occur. This is likely to be caused by the interpo- 
lation between the stars of the stellar library: the noise patterns 
are carried along in the interpolation, giving rise to noise patterns 
correlated in the direction of ages. 

stellar populations, and can therefore not be properly dis- 
criminated against true physical signal. The expected satu- 
ration is washed out by the interpolation between spectra, 
resulting in a degraded signature. This correlation affects 
us in two ways: it prevents us from determining the precise 
SNR of the basis, and then from computing the condition- 
ing number of the real problem (where SNRb — > oo). Only a 
lower limit on the conditioning number is obtained, meaning 
the real problem could actually be worse. 

Whatever process is responsible for degrading the noise 
signature, the properties of the problem in very high quality 
data regimes can not be inferred from the apparently noise- 
less initial kernel B. Let's return to the case of white noise, 
with a noisy kernel B + E. Its singular values saturate at 
some rank is- The singular vectors of lower rank are iden- 
tical to those of B but for higher rank, they differ strongly. 
Thus, the number of free parameters we can recover can 
not be larger than ib- For Pegase-HR we estimate is — 6 
for SNRb ~ 200. This means that high frequency variations 
of the stellar age distribution are unreachable, no matter 
what is the SNR of the data. This is a fundamental lim- 
itation of the problem, related specifically to the SNR of 
the single stellar population models. When SNRd ^ SNRb, 
a pure maximum likelihood estimation actually uses noise 
patterns inside the kernel as if it was true physical signal, 
and simulations will give results with an illusory accuracy. 
A useful technique, which explicitly accounts for modeling 
errors, is then total least squares (hereafter TLS). The total 
least squares solution to our linear problem (for simplicity 



we set W to Identity here) is defined by: 

XTLS = axgmin (||y - B • xf + ||B - B|| 2 ) , (24) 

x, B 

where ||x|| = \/x T ■ x d enotes the Euclidian (or £2) 
norm. More can be found in lHansen fc O'Learvl jl996l) and 
lOoluhetaljEoOri. 

However, in the rest of the paper, we will most fre- 
quently explore regimes where the dominant error source is 
the data, so that the number of degrees of freedom of the 
problem is dictated by SNRd rather than SNRb. It will also 
allow us to estimate what could be the best performance 
of the method, if the single stellar population models were 
taken as perfect. Thus, in the following sections, we will fo- 
cus exclusively on the treatment of noisy data, and will often 
drop the subscript "d" . 

3.5 Regularization and MAP 

This section explains how adequate regularization allows us 
to improve the behaviour of the problem with respect to 
noise in the data. Perturbation of the solution arises from the 
noise-dominated higher rank terms of Eq. 1221 . In order to 
ensure that x e remains small, one could reduce the effective 
number of age bins. Several criteria are applicable. 

• The singular coefficients should always be dominated by 
the true signal. With plots such as Fig. |2] we find that io is 
between 7 and 9 for SNR d = 100 per pixel with Pegase-HR 
single stellar populations. Nevertheless, in a real situation 
only uj y is generally available, and io is guessed from the 
rank for which the singular coefficients begin to saturate. 

• In the true signal dominated region, the singular coeffi- 
cients decrease faster than the singular values. Inversely, sin- 
gular coefficients decreasing faster than the singular values 
for any rank i guarantee the smallness of x e . This require- 
ment is known as the discrete Picard condition. See lHansenl 
lll994l) for further details. 

• A useful criterion that does not require any plot involves 
choosing the number of age bins n so that the conditioning 
number of the resulting kernel satisfies 

CN = <ri/<7„ < ^/mSNR d , (25) 

where m is the number of pixels. 

Note that this statement is SNR dependent. 

Another way to prevent the noise component from being 
amplified into the solution is to truncate the SVD expansion 
at some rank itruni> 

Hrunc T 

xtsvd = V — - Vi . (26) 

This technique is known as truncated S VD (hereafter 
TS VD). The use of this method dates back to lHansorJ dl97ll) 
and IVarahl lll973l) . The truncation rank i tr unc can be chosen 
with the help of plots such as Fig. [2] 

However, if the truncation is brutal, it will produce 
strong artifacts, known as aliasing, which reflects the fact 
that higher frequencies are projected onto a low frequency 
basis; the best fit leads to a non local alternated expan- 
sion which rings. Moreover, TSVD is best suited for prob- 
lems where a clear gap in the singular values is seen be- 
cause in this instance, the lower modes are well represented 
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by the truncated basis. Unfortunately, our kernel displays 
a smooth, continuously decreasing spectrum of singular val- 
ues. This is very similar to the situation in image reconstruc- 
tion. When deconvolution problems are addressed, the bru- 
tal truncation of the transfer function (which corresponds to 
the singular coefficients of the point spread function, here- 
after PSF) results in the formation of strong artifacts known 
as Gibbs rings. 

Moreover, we here have an other degree of complex- 
ity arising from the property that our problem is not shift- 
invariant. As a consequence, the solution singular vectors 
are fairly unsmooth and even more artifacts are expected 
as discussed in Sect. 14.1.21 In image deblurring, artifacts 
are reduced and reconstructions improved by apodizing the 
Fourier transformed PSF (i.e. making it smoothly decrease 
to 0), for example by Wiener filtering. 4 In a similar manner, 
we wish to apodize the singular value spectrum of the kernel 
B. 

We chose to regularize the problem by imposing the 
smoothness of the solution through a penalizing function. 
We define the objective function as 



defined in lPichon etafl (12002.) by 



*(x) 



log(/ P ost) = X (s(x))+/xP(x). 



(27) 



which is a penalized \ 2 , where P is the penalizing function: 
it has large (small) values for unsmooth (smooth) x. Adding 
the penalization P to the objective function is exactly equiv- 
alent to injecting a priori information in the problem. We 
effectively proceed as if we assumed a priori that a smooth 
solution was more likely than a rough one. This is in part jus- 
tified by the fact that any unregularized inversion tends to 
produce rough solutions. If we identify Q M with the expres- 
sion of the logarithm of the maximum a posteriori likelihood 
II (it we see that by building a penalization P we have built 
a prior distribution / pr i r 



/prior (x) = exp (-flP(x)) , 



(28) 



omitting the normalization constant. If y, = 0, the prior 
distribution is uniform and contains no information. It is 
a pure maximum likelihood estimation. If fj, > the prior 
probability density is larger for smooth solutions, and we are 
performing a maximum a posteriori likelihood estimation 
(MAP). 

The smoothing parameter \x sets the smoothness re- 
quirement on the solution. There are several examples 
of such regularizations in the litterature (Tikhonov, least 
squares with qu adratic constraint, m aximum entropy regu- 
larization . . . see lPichon et al.l i2002f) for a discussion). Here, 
we define P as a quadratic function of x, involving a kernel 



P(x) = x T 



(29) 



If L is the identity matrix I n , then -P(x) is just the square 
of the Euclidian norm of x. To explicitly enforce a smooth- 
ness constraint, we can use a finite difference operator 
D2 = diag 2 [— 1, 2, — 1] that computes the Laplacian of x, 



Non quadratic penalty functions, such as £±-£2 penalties which 
accomodate rare sharp jumps in the sought field, can also signif- 
icantly reduce the effect of ringing. 



D 2 







-1 







-1 

2 -1 



(30) 

The objective function Q M is then quadratic and has an ex- 
plicit minimum: 

x M = B y = (b t -W-B + m L t -L) 1 -B T -W-y, (31) 

where B is defined here to be the regularized inverse model 
matrix, whose properties we will investigate below. 

We may now derive a more insightful expression for x M 
while relying on the generalized singular value decomposi- 
tion (hereafter GSVD) of (B, L) (assuming W = I m or using 
the Choleski square root of W). According to Appendix ICl 
the regularized solution now writes: 

x M = argmin (||B ■ x - y|| 2 + fi ||L ■ x|| 2 ) , 



= [B • B + /x L ■ 
= V • [E 2 + n 2 ] ~ 

n 

= J2 Vi ( u » T ■ y ) Vi 
j=i 

where the filter factors r^: 
Vi = 



B y , 
S-U T -y, 



(32) 



(33) 



depend on the type of penalization and the smoothness pa- 
rameter p. For any quadratic penalization as in Eq. 129H . 
the matrices U, V, S = diag(<7i, (72, . . . , cr n ) and = 
diag(0i, 62, ■ ■ ■ , 9 n ) are given by the generalized singular 
value decomposition of the matrix pair (B,L) (see Ap- 
pendixEJfor details) . For the simple case of square Euclidian 
norm penalization, L = I n , the filter factors becomes: 

We then have 77; « l/cn when of 3> /J-, and 77^ — > for 
higher ranks (i.e. smaller singular values), so that division 
by almost is avoided in high rank terms. Thus, setting fi 
actually sets the rank where the weights of the SVD solution 
components begin to decrease. Note that the smooth cutoff 
(apodization) of the singular values should allow us to re- 
cover models similar to relatively high rank singular vectors 
provided that the weights associated to lower rank vectors 
are small enough. Small fi yield noise sensitive, possibly un- 
physical solutions, whereas very large fi lead to flat solutions 
whatever the data. The choice of thus appears as a critical 
step, and should give a fair balance between smoothness of 
the solution and sensitivity to the data. 



3.6 Setting the weight for the penalty: n 

The optimal weighing between prior and likelihood is a cen- 
tral issue in MAP since it allows us to taylor the effective 
degree of freedom of each inv ersion to the SNR of the data. 
See, e.g., iTitteringtonl 1^985) for an extensive comparison 
between various methods for choosing the value of the hyper- 
parameter \JL. 
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Figure 5. Histograms of the distribution of /^GCV f° r a linear stellar age distribution inversion with 60 age bins, and several SNR per 
pixel and penalizations. From left to right: Euclidian norm, Laplacian and D3 penalizations. The distributions are vertically offset for 
readility, and the SNR is given for each of them. The median of these distributions give the GCV-optimal smoothing parameter for each 
SNR and penalization. It is well defined in all cases except for very low SNR = 5 per pixel. The median parameter increases with the 
order of the penalization and decreasing SNR. Note the skewed distributions (this is quite generic in GCV). 



3. 6. 1 The automatic way: generalized cross validation 

Generalized cross validation (GCV) is a function of the pa- 
rameter fj,, the data and the kernel B, defined as 



GCV(/x) 



IBB 



tr 2 (l B ■ B) 



(35) 



where B is the regularized inverse model, defined by Eq. I|31[l 
and tr(-) is the trace of its argument. The min imum of GCV 
optimizes the predictive power of the solution l)Wahbal l99C0. 
in the sense that if any pixel is left out of the data, this 
pixel's value should still be well predicted by the correspond- 
ing regularized solution. For quadratic penalizations, one 
may obtain very simple expressions for the GCV function, 
speeding up its computation, and therefore the determina- 
tion of \i by several orders of magnitude. Using the GSVD 
of (B, L), we can derive: 



where 



GCV (/j) = 



= 1 



e; 



= i (pi "i T ■ y) 2 



(Er=i p' 



(36) 



(37) 



where <7; and #; are the singular values obtained from the 
generalized singular value decomposition of the matrix pair 
(B, L) (see AppendixlUl . Note that the fi in the denominator 
of pi factorizes out in the expression of GCV(/x). 

When available, the minimum of GCV provides a good, 
data quality-motivated value for fi. Moreover, GCV has been 
extendedly tested and applied by a number of authors, in 
several fields of physics. Figure |5] shows distributions of 
/^gcv for a mono-metallic inversion for several SNR and pe- 
nalizations. Each histogram results from 150 experiments. 
The GCV determination of the smoothing parameter is suc- 
cessful over a wide range of SNR, in the sense that the his- 
togram shows a clear maximum. This maximum is best de- 
fined for the Tikhonov penalization (square of the Euclidian 
norm). With laplacian and higher order penalizations, espe- 
cially for low SNR, the GCV values are more widely spread. 



Nevertheless, we can still obtain a useful value by extrapo- 
lating the higher SNR fi down to the desired SNR. 



3.6.2 Empirical approach: trial and error 

GCV and most of the automated smoothing parameter 
choice methods were designed for linear problems. In the 
case of non-linear problems, it can provide a useful value 
for /1 t o start with, but fin e empirical tuning is also re- 
quired llCraig fc Brownlll986f) . For instance, when positivity 
is imposed through reparameterization or gradient clipping, 
H should be smaller than fiGCv- Indeed, since the positive 
problem has a better behaviour than the full linear one, it is 
expected that GCV overestimates fj,. One can thus afford to 
lower it to some extent without threatening the relevance of 
the solution. As a consequence, finer structures can be recov- 
ered. To set n for the positive problem, we used the simple 
following procedure. First, we set /j, = fiGCv- We produce 
mock data, and perform successive inversions, while decreas- 
ing /i. As a consequence, finer structures are recovered. At 
some point, we will enter a regime where the structures of 
the solution can be identified as artifacts. This transition 
defines a lower limit above which p: should remain. 



3.7 Where is the age information? 



Which spectral domains or lines are most discriminative in 
terms of population age-dating? An answer to this can be 
given by inspecting the properties of the regularized inverse 
model matrix B(/x) defined by Eq. 13111 . In effect, we ex- 
pect the peak to peak amplitude of a column of B(/x) to 
be largest for the most discriminatory wavelengths for age- 
dating. In Fig. |SJ the inverse model matrix was computed 
for a Laplacian penalization with hgcv = 10 2 correspond- 
ing to SNR = 100 per pixel with 60 age bins from 10 Myr to 
20 Gyr and half-solar metallicity. It shows that the Balmer 
lines H a ,i3,-y,s, along with the spectral regions of the Lick 
index NaD, the magnesium indices Mgj, Mg 2 , Mg^ and the 
calcium Ca4227 have strong weight in the age-dating pro- 
cess. Note that the above analysis is clearly noise dependent 
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Figure 6. Black solid line: peak to peak variations of the inverse model matrix discussed in Sect. 13,71 In this example, we took 60 
age bins and fi = 10 2 corresponding to SNR = 100 per pixel with Laplacian penalization. Large values point at age-sensitive parts 
of the spectrum. A 500 Myr single stellar population with half-solar metallicity is shown as reference (grey solid line). The spectral 
domains corresponding to the Lick indices appear as grey-shaded areas. Many of the spectral domains involved in the Lick system seem 
to effectively carry more information than the rest of the spectrum. However, the information is still widely distributed along the whole 
optical range. 



via B(^igcv)- The list of relevant lines will change with the 
SNR. Many of the wiggles and peaks of the inverse model 
remain so far uninterpreted, and many peaks hit spectral 
domains where no referenced index is known, but still con- 
tribute strongly to age separation. Another important fea- 
ture of the inverse model is that most of its norm is in the 
form of low value pixels. If some of the peaks were 2 or 3 
orders of magnitude larger than the average value, we could 
conclude that most of the information is contained exclu- 
sively in the corresponding lines. Yet, the Fig. |S| does not 
allow us to reach this conclusion. Even though the informa- 
tion seems denser in the strongest, well known lines, most of 
it remains in the form of a large number of weaker lines, more 
concentrated in the blue part of our spectra. This supports 
the intuition that a lot of information is left aside by looking 
exclusively at spectral indices, and that the constraints ob- 
tained therefrom are not optimal. Hence our effort to build 
a global spectrum fitting tool. 



4 VALIDATION: BEHAVIOUR OF THE 
LINEAR INVERSION 

Let us now apply STECMAP to mock data, to study the 
biases and the dispersion of the solutions, and to test for dif- 
ferent penalizations. Producing mock data involves choosing 
a model age distribution, xm , and a noise model, e. A mock 
spectrum is then obtained as y = B • xm + e. The corre- 
sponding astrophysical goal is the recovery of the star for- 
mation history of mono-metallic stellar populations (for ex- 



ample superimposed clusters) seen without extinction. The 
stellar age distribution models for these objects are single 
(Sect. 14.1^ or multiple (Sect. l4"2"t star formation episodes of 
approximately Gaussian shape. Recall that no assumption 
on the shape of the distribution is included in the inversion 
process. The only a priori is the smoothness of the solution, 
while the smoothing parameter is set by GCV. Here we re- 
late the results of our simulation to the properties of the 
solution singular vectors, thereby explaining the generation 
of artifacts. 



4.1 Single bump stellar age distribution 

Let us discuss in turn the relationship between the arti- 
facts of the reconstructions and the shape of the solution 
vectors (Sect. 14.1.11 . the flux- averaging of the basis and 
the behaviour of the problem regarding the fiducial model 
(Sect. l4~l. 2(1 . the choice of penalization (Sect. l4~1.3|l . the re- 
quirement to impose positivity (Sect. |4~1.4L and the need 
for an extensive simulation campaign (Sect. l4~1.5ft 



4-1.1 Artifacts and the shape of the solution vectors 

Since any solution is a linear combination of the solution 
vectors v< (see Eq. 1321 ). their shapes impose what kind of 
shape for x can or can not be reconstructed, depending on 
what feature in the observed spectra is best matched by the 
corresponding data singular vectors. 
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Figure 7. Blow-up of the bottom right panel of Fig. Isl showing 
only the mass reconstruction of the oldest bump. The dashed line 
is the model distribution, and the diamonds show the median of 
the recovered age distributions for 10 realizations. The error bars 
showing the dispersion are smaller than the symbol itself. The 
details of the shape of the mass distribution reconstruction trace 
closely the 4th singular vector of the kernel B, with very little 
dispersion, showing that the artifacts and the fine structures of 
the reconstructions are closely related to the properties of the 
single stellar population models. 



Moreover, as regularizing the problem involves attenu- 
ating the high rank terms of Eq. 13211 . the detailed shape of 
the solution is in general given by the first few v;. Figure Q 
shows the stellar mass distribution reconstruction of an old 
population. It is actually a blow-up of the recovery of the 
oldest burst in the bottom right panel of Fig. |5] The penal- 
ization is square Euclidian norm, so that the relevant sin- 
gular vectors are given by the SVD of B. The details of the 
solution are mostly those of the 4th solution singular vector, 
and appear as a systematic artifact (the diamonds are the 
median of 10 realizations, and the dispersion of the solutions 
is smaller than the symbol itself) . The spurious young com- 
ponent between 10 7 ' 5 and 10 8 yr seems to be related to the 
4th singular vector as well, and also appears systematically 
even though it has no physical reality. The fine structure and 
the artifacts of any solution thus rely most on the properties 
of the single stellar population basis rather than on the data 
or even the realization of the noise. 

It is generally impossible to reconstruct accurately the 
shape of the distribution for ages where the singular vec- 
tors display no structure. The right panel of Fig. |5] shows 
that the 10 first singular vectors of the absolute flux kernel 
have very little structure for ages larger than ~ 3 Gyr. Cor- 
respondingly, the right panels of Fig. |H] show that indeed, 
in this range of ages, the shape of the distribution is very 
poorly constrained. 

For an inversion problem to be well behaved, the first 
solution singular vectors, the Vfe, should be rather smooth. 
They should display more and more oscillations as the rank 



k increases (typically k — 1 oscillations) , but remain smooth 
and regular. The unsmooth aspect of our singular vectors 
arises from the temporal roughness in the spectral basis. 
This could also be related to physical fast evolution of the 
single stellar populations in some specific stages of stellar 
evolution, producing variable distance between the elements 
of the basis. It also reflects the non shift-invariance of the 
problem, as is also illustrated by Fig. [3] 

Some further artifacts can however not be trivially ex- 
plained by the solution singular vectors alone. For example 
many of the displayed solutions, even with high SNR, show 
variations far away from the bulk of the signal, seen as mis- 
leading spurious secondary bumps. This artifact is the ana- 
log of Gibbs rings in imaging. It arises because the higher 
frequency modes needed to suppress these secondary oscil- 
lations are attenuated by regularization, and would be best 
identified by examining the GSVD of (B,L). It is the old 
age extension of the low frequency mode involved in build- 
ing the main bump. We will deal with this by introducing 
positivity in Sect. 14. Ql 

4-1.2 Flux-normalized basis and independence from the 
fiducial model 

In practice, one can choose between a basis where the flux 
of each single stellar population is given for IMq (absolute 
flux basis or mass-normalized basis), and a basis where the 
flux of each single stellar population has been normalized to 
the same value (or flux-normalized basis, cf. Sect. 12. lV This 
choice has a physical meaning: in the first case, the unknown 
x will contain mass fractions, whereas in the latter case, it 
will contain flux fractions. 

There are several reasons why we prefer to work with 
the flux-normalized basis. 

It is more directly linked to the luminous properties of 
the observed population (and thus less directly linked to the 
mass): a component of a given flux can not "hide" behind 
another component of similar flux. This is not true for com- 
ponents of similar masses, due to the evolution of M/L(t). 
For instance, in the upper right plot of Fig. |SJ the mass of 
the older components is poorly constrained when the model 
is a young burst. This is expected, because when a young 
component is present, adding the same mass of old stars will 
have very little effect on the integrated optical light. This 
is predictable from the lack of structure beyond 3 Gyr in 
the singular vectors of the right panel of Fig. |U] (see also the 
discussion in Sect. l4.l"Tl . Modulations in this range of ages 
are seen in the vectors of the right panel for the higher rank 
vectors only. On the other hand, the singular vectors of the 
flux-normalized basis (left panel of Fig. display structure 
in the large ages even for low ranks, indicating a better be- 
haviour. And indeed, the upper left plot of Fig.|H|shows that 
all the flux fractions are satisfactorily constrained no matter 
if the model population is young or old. In this respect, the 
"separability" issues tackled later in the article for superim- 
posed populations (Sect. 14.21 are more easily discussed in 
terms of flux fractions. Note that it is however not expected 
that the mass fractions obtained by multiplying the flux 
fractions by M/L(t) be accurate over the whole age range 
(positivity will improve this particular aspect significantly; 
see Sect. 14.1.41 . 

The difference of behaviour between the mass and flux 
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Figure 8. Simulations of the reconstruction of a young, intermediate and old single burst populations. The thick histograms represent the 
models, while the symbols and vertical bars show the median and interquartiles of 10 inversions. Negative values in these reconstructions 
have been set to zero for clarity. Right: Case of an absolute flux basis. The plots thus represent mass fractions. Left: Case of a flux 
normalized basis. Thus are represented flux contributions. The SNR is fixed to 100 per pixel with R = 10 000. The penalizations are 
square Euclidian norm (bottom) and Laplacian (top). In terms of distance to the model, the bumps are best reconstructed in flux 
fractions, and the best penalization is Laplacian. We checked that Laplacian penalization gave flux fraction reconstructions similar to 
the third order penalization, showing that these do not strongly rely on the details of the smoothness a priori. 



fractions reconstructions is also reflected in the variation 
of the transition rank io (see Sect. I3.4P between the noise 
and signal dominated regimes, as shows Fig. IA1I For a 
mass-normalized basis, the transition rank io i ncreases with 
the ag e of the fiducial model x (as defined in lPanter et alJ 
(2003)), from 5 to 20. On the other hand, for a flux- 
normalized basis, the transition rank remains around 7-9 
in this pseudo-observational setup, no matter the age of the 
fiducial model. Ideally, we would like to come up with a 
problem whose behaviour is fixed only by the SNR. In this 
respect, independence of the transition rank io from the fidu- 



cial model is a welcome property. We thus chose to carry on 
with the flux-normalized basis for the rest of the paper. 



4-1.3 Laplacian or square Euclidian norm penalty 

Figure |H| allows us to check which penalization gives the 
solutions with smallest distance to the model. First of all, it 
is quite clear that the square Euclidian norm penalization 
is worst, because it produces both flattened solutions and 
strong artifacts. Indeed, requiring the norm of the solution to 
be small does not explicitly have an effect on the smoothness 
of the solutions. 
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Figure 9. Solution singular vectors of the flux-normalized kernel (left) and the absolute flux basis (right). The vectors are vertically 
offset for lisibility, and the associated singular values are given on the right. The low rank singular vectors of the absolute flux basis are 
very flat in the large ages, indicating that no information about these populations can be obtained unless we have very high SNR. On 
the contrary, fluctuations in large ages are already present in the low rank singular vectors of the flux- normalized basis, which indicates 
the better feasibility of reconstructing the age distribution in the older part. 



Laplacian penalizations give results very similar to the 
third order penalization D3 = Diag 3 [— 1, 3, —3, 1] defined as 
in Eq. I30H . The latter are therefore not plotted, and per- 
form equally well at any SNR. Both produce moderately 
flattened solutions showing increasing dispersion with de- 
creasing SNR, without systematic bias in age. The width of 
these bumps is a simple (but crude) measure of the time res- 
olution of the reconstructions, because any bump narrower 
than the models displayed would be broadened by the in- 
version. The absence of significant difference between the 
results of the Laplacian and third order penalizations shows 
that the inversion does not rely strongly on the details of 
regularization, as long as it involves a differential operator. 
We chose to carry on with the Laplacian penalization for 
the rest of the paper. 

4-1-4 Positivity and Gibbs apodization 

Positivity of the solution is a physically motivated require- 
ment, but it also stabilizes the inversion by strongly reduc- 
ing the explored parameter space. The maximum frequency 
(or best resolution in age) that would be obtained for infi- 
nite SNR is thus not only a matter of basis ill-conditioning 
but also has a methodological component. This is illustrated 
by the slightly better age-resolution (and thus higher fre- 
quency) obtained while relying on positivity as shown in 
Fig. llUI Unfortunately there isn't any simple extension of the 
analytical ill-conditioned problem diagnosis to the non linear 
problem. Also the minimization of Q M defined in Eq. 127H 
requires efficient algorithms as described in Appendix [B] 
As any regularization method, positivity will also introduce 
some bias. Indeed, the solutions in Fig. llOl seem to be slighlty 



asymmetrical compared to the linear solutions. However, one 
strong advantage of positivity is its ability to reduce Gibbs 
ringing. Linear solutions with any penalization exhibit spu- 
rious oscillations even far from the main bump, which can 
be interpreted as a superimposed component. These annoy- 
ing artifacts do not appear in the positive solutions as shows 
Fig. 1101 In many applications, this property turns out to be 
more important than the possible bias it might introduce in 
age estimation. 

4-1-5 Why carry out an extensive simulation campaign? 

An inversion method can perform very well for some spe- 
cially chosen cases while performing poorly generally. As an 
example we discuss the recovery of the age distribution of a 
complex population consisting in a superposition of young, 
intermediate, and old sub-populations. Each of these 3 com- 
ponents contributes equally to the total observed spectrum 
y. The noise is Gaussian. Figure fTTI shows reconstructions 
of the age distribution by the Eq. (13 IP . for 150 realizations, 
with a Laplacian penalization. The reconstruction seems to 
be satisfactory: it is unbiased and the interquartile intervals 
of the solutions shrink with increasing SNR. A naive reading 
of Fig. II II would suggest that we are able to recover nearly 
any age distribution, without bias and with very small error 
for all the time bins, even with quite low SNR, but there 
is a trick. Why do the simulations in Fig. Illl look so good? 
First, the temporal frequency of the solution is lower than 
in the single bump simulations. Second, higher frequency 
sine fuctions are needed to represent a single bump than 
to represent a sinusal curve (one is enough). Thus, as the 
first singular vectors roughly form a basis of sine functions, 
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Figure 10. Same as Fig.[S]with a flux-normalized basis, positivity enforced by quadratic reparameterization and Laplacian penalization. 
Results of simulations for SNR = 100 and SNR = 10 per pixel at Ft = 10 000 are shown. Even though some residual remains, the solution 
sticks to zero where it should, instead of displaying Gibbs rings. 



one needs fewer and lower order solution singular vectors to 
represent a sine function than a bump, and lower SNR. 

One simple (yet unadvisable!) recipe to make good look- 
ing simulations even without regularization could involve the 
following steps: 

(i) choose as model x one of the last few solution singular 
vectors Vfc (or one of the first few if some penalization is 
implemented) 

(ii) compute the corresponding pseudo-data y = B ■ x 

(iii) noise the data at chosen SNR 

(iv) invert and show how close the recovered solution lies 
to the initial model 

By doing so, we managed to produce apparently good look- 
ing simulations down to SNR = 0.1 per pixel. Thus the 
requirement to assess and demonstrate the validity and ef- 
ficiency of the MAP method carried out in this section. 

4.2 Age separation versus R and SNR 

We have already made clear that we can not recover all the 
high frequency oscillations of a given stellar age distribution 
even with very high SNR, but rather moderately slow vari- 
ations, corresponding to smooth solutions. Let us nonethe- 
less consider the special case where a composite population 
consists of two successive bursts, i.e. stellar age distributions 
with two bumps of same luminosity. This is one order of com- 
plexity above the classical characterization of a population 
through one unique age using Lick indices. And indeed, it 
applies to many astrophysically interesting cases. The abil- 
ity to separate the two main populations would allow for 
example to age-date respectively the disc and the bulge of 
unresolved spiral galaxies, or late stages of accretion and star 
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Figure 11. Same as Fig. 1101 with a 1 + sin model for the stellar 
age distribution. The SNR per pixel is given for each experiment 
(10 realizations), and the resolution is -R = 10 000. The smooth- 
ing parameter fi was adjusted by running several simulations and 
choosing the one providing the smallest distance to the model. 
The reconstruction is excellent, but there is a catch: it turns out 
that sine functions are intrinsically easier to recover than sin- 
gle bumps, given the shape of the solution vectors of the kernel. 
Hence, such reconstructions are very misguiding. More systematic 
simulations are required. 
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Figure 12. Recovery of double bursts for several SNR per A. The large circles are the models. Their coordinates (ai,a2) are the ages 
of the two bursts. The smaller circles with error bars show the median and the interquartiles of the recovered ages in 10 reconstructions 
each. The dotted line represents the ai = a<2 limit. Solutions that do not satisfy the quality criteria illustrated in Fig. 1141 are rejected 
and not plotted. The upper diagonal part of each panel shows -R = 2 500 results while the lower diagonal part shows R = 10 000 results. 
Results for the other spectral resolutions down to R w 1 000 are very similar and therefore not shown. Our ability to separate close 
double bursts improves with increasing SNR, but does not significantly change with spectral resolution. The top left panel illustrates the 
definition of the resolution in age as the median length of the segments. Note that the shape of the "unseparable" zone and its evolution 
with SNR are similar to that shown in Fig. |3] 



forming activity in ellipticals in surveys such as SDSS and 
2DFGRS. It would also allow to better constrain the mass 
to light ratio of such complex populations. We wish to inves- 
tigate what observational specifications (spectral resolution, 
SNR) are required to reliably perform such a separation. We 
thus ran extensive simulations of reconstructions of double 
bursts populations. The spectral resolution, SNR and the 
age separation Aage between the 2 bursts were varied, and 
the recovered ages were studied as a function of R, SNR, 
and Aage. Figure 1121 shows the recovered and model age 
couples (ai,<Z2) in several experiments of double bursts su- 
perpositions, for SNR = 20 to 200 per A, at R = 10 000 and 
R = 2 500. The model age grid takes 13 values, separated 
by 0.2 dex, therefore defining 78 age couples. 

These systematic simulations allow us to estimate the 
resolution in age achievable for a given (R, SNR) and the 
corresponding errors. It is a solid, systematic way for testing 
the method in different regimes. The smoothing parameter 
was set for each (R, SNR) by taking the GCV value as a 
guess and fine tuning it in order to obtain stable reconstruc- 



tions of close bumps. The quality of the reconstructions is 
assessed using two criteria: 

• since, in the model, the two bursts have exactly the 
same luminosity, we require that the areas of the two biggest 
bumps have a ratio smaller than 2. 

• the minimum between the two main bumps of the so- 
lution should be fairly low, otherwise it is difficult to state 
whether the populations are truly distinct or part of an ex- 
tended star formation episode. Here we required the mini- 
mum to be lower than 10% of the mean height of the biggest 
bumps. 

The solutions are required to satisfy these two criteria to be 
considered as "good" in terms of age separation. Figure H3l 
shows as an example an acceptable (well-defined bumps, 
minimum at 0), and a rejected solution (bumps and min- 
imum unclear) . In Fig. 1121 we retained exclusively the cases 
satisfying these criteria, i.e. for the other age couples (not 
plotted), the recovered stellar age distributions failed one or 
both criteria. A common failure is the recovery of one wide 
bump instead of two, indicating that the sub-populations 
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Figure 14. Left: Resolution in age, in dex versus SNR per A for various spectral resolutions. As expected, the age resolution improves 
with increasing SNR, and seems to settle around 0.4 dex for the highest SNR. No significant trend is seen with spectral resolution. Right: 
Mean error of the age estimates for the successful cases (according to our criteria). The mean error is approximately one order of 
magnitude smaller than the resolution in age, and decreases with increasing SNR. 
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Figure 13. Selection criterion: the rejected solution shows no 
clear separation, while the accepted one has two clear bumps of 
similar area with a well defined minimum. 



are not separated given the SNR and spectral resolution. 
Thus, the empty region between the successfully separated 
couples and the bisector (dashed line) is a region of "un- 
separable" couples. The width of this region indicates the 
resolution in age that we can achieve. This region shrinks 
with increasing SNR, showing that we can separate two close 
sub-populations more accurately. We superimposed on the 
leftmost panel of Fig. 1121 several vertical segments spanning 
the "unseparable" region. We define the resolution in age as 



the median length of these segments. The statistical error 
on this quantity is of order 0.2 dex for SNR = 20 per A. 

In a realistic observational context, a separation of two 
sub-populations with an age difference lower than the com- 
puted resolution in age should not be attempted, or at 
least not trusted. The resolution in age achieved here is 
a lower limit because no error source other than Gaus- 
sian noise is considered. Other possible sources of noise are 
glitches, residual sky lines, non-sky emission lines (when not 
masked in W), spectrophotometric and wavelength calibra- 
tion error, and models error, along with effects of the age- 
Z-extinction degeneracy (in this section the true metallicity 
of the observed system was known a priori). 

Figure IT21 also shows that the error on both ages of the 
couple of sub-populations decrease on average with increas- 
ing SNR, as expected. For small SNR, the figure is quite 
inconclusive, and the recovered age couples are more or less 
randomly spread all over the age domain, while for high 
SNR, every couple seems to be quite in place, even though 
some couples remain slightly offset. For other resolutions, 
the plots are quite similar, and therefore we do not repro- 
duce them here. The left panel of Fig. 1141 gives a synthesis of 
all the experiments by showing the resolution in age, com- 
puted according to the given definition, versus the SNR per 
A, for several spectral resolutions. The resolution in age im- 
proves with increasing SNR, from 0.9 dex at SNR = 20 per 
A to 0.4 dex at SNR = 200 per A. Given the small num- 
ber of measurements of the width of the unseparable zone 
in each experiment, the variation of the resolution in age 
with spectral resolution is not highly significant, and thus it 
seems that, as long as the SNR per A is conserved, spectral 
resolution does not significantly improve our ability to sep- 
arate sub-populations. The right panel of Fig. H4l shows the 
error on recovered ages versus SNR for the successful sepa- 
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Figure 15. The high resolution spectral energy distribution of 
model extinctionless mono-metallic population with Z = 0.004 is 
inverted using spectral bases with different metallicities for several 
SNR. For SNR = 10 per pixel, the metallicity is moderately well 
constrained (AZ «s 1 dex), while for SNR 30 per pixel all 
the metallicities other than 0.004 can be rejected at the 90% 
confidence level. 



rations, for several spectral resolutions. The error decreases 
with increasing SNR, as expected, and is about ten times 
smaller than the resolution in age for the same SNR. Again, 
no strong trend is seen with spectral resolution. 



4.3 Compressed versus uncompressed data 

In this section, we discuss the similarity between SVD and 
Gram-Schmidt othonormalization (GSO), the decomposi- 
tion s cheme adopted by MOPED 's authors jReichardt et alJ 
1200 ll) . This comparison is carried out in the mono-metallic, 
extinction-less regime. Data can be compressed by multiply- 
ing it by the n singular vectors to obtain n numbers con- 
taining the same information as the whole original spectrum. 
Appendix [D] shows that, the fact that the singular vectors 
are provided by non-truncated SVD or GSO makes little 
difference in the linear regime. The compression can effec- 
tively be lossless, but the conditioning of the problem is un- 
changed, as shown by the inspection of the singular values in 
the left panel of Fig. ID II The right panel of Figure lDll shows 
the result of a Gram-Schmidt othonormalization (Eq. |D2jl) 
and a SVD (Eq. 1221 ') inversion for a composite population in 
a moderately ill-conditioned example. They are equal down 
to machine precision. Minimizing the \ 2 °f the compressed 
data involves the issues discussed in Sect. 13.41 if the com- 
pression is provided via the SVD or GSO singular vectors. 



4.4 Constraints on metallicity? 

When attempting to reconstruct the stellar age distribution 
from real observations, one would still have to guess the 



metallicity of the population. A classical parametric way to 
proceed would be to perform a mono-metallic inversion for 
each of the available metallicities in the basis. If the dom- 
inant observational error is Gaussian, we expect the \ 2 t° 
be minimum when using the true metallicity. However, be- 
cause of the age- metallicity degeneracy, it might not be so 
clear, and one could expect to reach a good x 2 even with 
an erroneous metallicity guess, resulting in an error in age 
estimation. Figure FiTI shows a plot of the reduced \ 2 when 
inverting a population of metallicity Z = 0.004 with a ba- 
sis of different metallicity for several SNR and R — 10 000. 
The smoothing parameter was chosen using GCV with the 
Z — 0.004 kernel. The best fit is always obtained when the 
initial model metallicity is used. We computed the 90% con- 
fidence level by taking as the number of degrees of freedom, 
the number of pixels in the spectrum minus the number of 
age bins (40 in this example) . This choice could be discussed 
because the weights of adjacent time bins are correlated by 
the penalization. However, the number of time bins remains 
far smaller than the number of pixels and thus plays no crit- 
ical role. For SNR = 10 per pixel (i.e. SNR = 20 per A), we 
can not reject fits with wrong metallicities Z G [0.002, 0.009]. 
The error on metallicity can therefore reach 0.35 dex for 
SNR = 10. The range of acceptable metallicities however 
shrinks rapidly with increasing SNR, tightening the con- 
straints. At SNR ^ 30, it is possible to break the age- 
metallicity degeneracy, and thus to let metallicity be a free 
parameter of the inversion problem. 

This closes our detailed investigation of the idealized 
problem of recovering the stellar age distribution of a mono- 
metallic, reddening-free stellar population. 



5 STELLAR CONTENT AND REDDENING 
RECOVERY 

The previous section presented STECMAP in an idealized 
regime, which could only be applied to observations where 
both the metallicity and the extinction are known a priori, 
which is rarely the case in reality. We now present an ex- 
tension of STECMAP accounting for these additional free 
parameters as well. In Sect. 15. ll the full linear age-metallicity 
problem is examined, where both metallicity mixing and age 
mixing are allowed, and study its behaviour. Then, for sim- 
plicity, and given the extremely poor conditioning of this 
problem, the unknown metallicity will be handled specifi- 
cally as an age-metallicity relation. The technique for re- 
constructing the stellar age distribution, the age-metallicity 
relation and the extinction will be presented in Sect. 15.21 
along with a few example simulations in Sect. 15.31 and fi- 
nally its applicability and accuracy will be discussed while 
exploring several observational regimes in Sect. 15.41 



5.1 2D Linear age-metallicity problem 

Here we consider a very composite population where sev- 
eral sub-populations with different ages and metallicities are 
superimposed. Let us define a 2D stellar age and metallic- 
ity distribution A(i, Z) yielding the fraction of optical flux 
emitted by stars with age t € [t, t + dt] and metallicity 
Z £ [Z, Z + dZ] . The model spectrum is the integral of A 
over age and metallicity space. Discretizing as in Sect. [3] wc 
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get the discrete model spectrum as the weighted sum of the 
single stellar populations for all the ages and all the metal- 
licities in the basis. Here the parameter vector is a 2D map 
containing the weights Xij of the single stellar population of 
age ti and metallicity Zj. The model matrix B is the con- 
catenation of the mono- metallic bases described in Sect.|21 
i.e. sequences of single stellar populations in age and metal- 
licity. Its conditioning number is commonly of order 10 s , 
telling us that thorough regularization is required. 

5.1.1 Where is the information on Z? 

In a manner similar to Sect. 13771 we can determine which 
spectral domains are important for age and metallicity de- 
termination. We compute the inverse model matrix B of the 
problem for a given SNRd and look for large peak to peak 
variations in this matrix, indicating spectral features hav- 
ing strong discriminative power, as shown in Fig. 1161 Most 
of the bands involved in the Lick indices carry a lot of in- 
formation. However, some of them, like Ti02, seem to be 
unimportant, and a large number of medium and high reso- 
lution lines not involved in Lick indices actually carry most 
of the information. The comparison with Fig. |5] shows that 
several metallic lines, which were not important for a mono- 
metallic population age distribution recovery, turn out to 
carry a substantial part of the information when the metal- 
licity is unknown. Again, the blue part of the spectrum seems 
to be more discriminative. 

Since age sensitive and metallicity sensitive lines are 
spread along the whole optical wavelength range, any small 
section of the spectrum has good c hances of containing such 
lines (see iLe Borgne et al.l 1(2004) for an example around 
H 7 ). Thus, if the available data does not allow reliable full 
optical domain fitting, plots such as Fig. lKjl are a good start- 
ing point for the search for new high resolution indices. The 
use of the whole spectrum implies some redundancy, but 
considering the sensitivity of the inversion problem to noise, 
this redundancy is highly welcome 5 . 

5.1.2 Age-metallicity degeneracy? 

We carried out the following experiment illustrated in figure 
1171 We produced mock data corresponding to a 2D stellar 
age and metallicity distribution map x and investigated how 
well we could reconstruct it for a given SNR. In the exam- 
ple of figure H7I (top panels), the model is a mono- metallic 
bump centered on 1 Gyr and Z — 0.008. The correspond- 
ing mock data is noised and then inverted as in Eq. 13111 
except that B is now the multi-metallic single stellar popu- 
lation basis defined above. The penalization is Laplacian. In 
this experiment, we focus on the broadening of the bump in 
the metallicity direction as a signature of the age-metallicity 
degeneracy. 

The inspection of the first non attenuated solution sin- 
gular modes tells us about the properties of the regularized 
problem. The panels c and d of Fig. 1171 show the second 
and the fifth solution singular modes of the model matrix 
B. Each of them is an age-metallicity map. The shapes of 

5 the redundancy is also useful in oder to address in part prob- 
lems induced by the poor modeling of some spectral lines 



the stellar age distribution for each metallicity in the second 
singular mode are very similar, indicating bad separability 
between metallicities. Thus, if only the first singular modes 
are recovered, the solutions will have a strong tendancy to 
be flat in the metallicity direction. 

The fifth singular mode is the first one to show a well- 
defined structure: a bump in age, elongated in the metallic- 
ity direction, with a slight shift to larger ages with decreas- 
ing metallicities. This traces the age-metallicity degeneracy: 
a pure mono-metallic population will be reconstructed in 
regularized regimes as a composite, mixing younger metal 
rich single stellar populations with older metal poor single 
stellar populations. The a and b panels of Fig. H7l show re- 
constructions of such age-metallicity maps for R = 10 000, 
SNR = 500 and 200 per pixel. The model consists of a sin- 
gle bump centered on 1 Gyr and Z = 0.008, and the pe- 
nalization is Laplacian. For SNR = 500 per pixel we see 
that the population is effectively reconstructed as a single 
bump in age and metallicity. The age-metallicity degeneracy 
is in this example explicitly broken. The same experiment 
with SNR = 200 per pixel gives a solution degenerate in 
metallicity: the mono-metallic population is seen as the sum 
of three mono-metallic sub-populations contributing nearly 
equally to the total light. The younger component is more 
metal-rich, while the older one is poorer, as is expected for 
age-metallicity degenerate solutions, and is similar to the 
trend seen in the solution singular modes. In this example, 
the smoothing parameter was chosen by generalized cross 
validation. More realizations of this experiment gave sim- 
ilar degenerate solutions. From the shape of the fifth so- 
lution singular mode, we can measure the slope of the age- 
metallicity degeneracy, that is the slope defined by the max- 
ima of the bumps of the singular mode in the age-metallicity 
plane. We find it to b e equal to 0.3, w hich is much smaller 
than the 3/2 given in IWorthevI <ll994h . Smaller slopes indi- 
cate a better definition of age. This is expected because here 
we consider the whole optical range and the continuum as 
reliable. 

As a conclusion, we found 2D age-metallicity map re- 
constructions to be feasible for only very high SNR 500. 
Since this is comparable or larger than SNRb, we consider it 
strongly unphysical. Moreover, from an observational point 
of view, such a high (SNR, R) combination for an outer 
galaxy is generally unreachable in reasonable time with the 
present generation of instruments. Thus, inversions with this 
complexity and SNR are doubly challenging. We now ad- 
dress a simplified version of this problem by reducing the 
metallicity parameters to a ID age-metallicity relation. 



5.2 Non-linear age-metallicity recovery 

In the rest of the paper we assume that the chemical proper- 
ties of the population are represented by an age-metallicity 
relation Z(t) of unknown shape. In contrast to Sect. 15. ll the 
sub-population of age tj is therefore assigned one and only 
one metallicity Zj rather than a metallicity distribution. In 
addition, we now allow the spectral energy distribution to 
be affected by an extinction / ex t(-E, A) parameterized by the 
color excess E. Finally, accounting for the age distribution 
A(t), the observed spectral energy distribution at rest then 
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Figure 16. Same as in Fig. |§]for the linear age-metallicity distribution recovery. The dimensions of the inverse problem are 60 age 
bins and 5 metallicity bins. The smoothing parameter is set by GCV for SNR = 100 per pixel. The grey solid line is a 1 Gyr half-solar 
metallicity single stellar population for reference. Many of the spectral domains involved in the definition of the Lick indices system seem 
to carry more information than the rest of the spectrum. However, the information is still widely spread along the whole optical range 
in the form of medium depth lines, suggesting there is a large number of potential high resolution indices. 



writes: 

F rcst (A) = /oxt(£,A) j A(t)B(X,t,Z(t))dt. (38) 

This model is linear in age distribution A, and non linear 
in metallicity Z and extinction E. Recall that / cxt may be 
replaced by other parametric functions of wavelength that 
could for instance describe flux calibration corrections. 



5.2.1 Discretization and parameters 

Following the same prescription as in Sect. [31 but account- 
ing for extinction, we can derive the discretized version of 
Eq. 1381 . Provided the extinction law is very smooth com- 
pared to the size of the wavelength bins, the model of the 
sampled spectral energy distribution of the reddened com- 
posite stellar population in the i-th spectral bin writes: 

Si = J F rcst (A) gi(X) dX 

~ /ext(-B,Ai) J 9iWj t M A(t)B(X,t,Z(t)) dfdA,(39) 
which simplifies to: 

n 

Bi = fm*(E,\i)'52B i , j x it »€{l,.,m}, (40) 

or in matrix form: 

s = diag(f ext (-B)) -B-x, (41) 



where the kernel matrix B and the vector x of the age distri- 
bution A(t) sampled upon time are defined as in Sect. [2] and 
diag(f cx t) is the diagonal matrix formed from the extinction 
vector: 

fext(S) = (/ext(£, Ax ),..., / cx t(£,A m )) T . (42) 

which contains the extinction law seen by the population 
and depends non-linearly on the color excess E. Note that 
B contains the single stellar population basis for the age- 
metallicity relation vector Z (the age-metallicity relation 
Z(t) sampled in time). 

From a computational point of view, any matrix prod- 
uct involving diag(f ex t(-E)) is very expensive and can be 
profitably implemented using term to term product. How- 
ever, in order to save the introduction of confusing operators, 
we will carry on with the current notation. 



5.2.2 Smoothness a priori with MAP 

The model defined by Eq. H41H is non-linear because of 
the dependancies of T and B on respectively E and Z. 
We can therefore not refer to the classical definition of ill- 
conditioning. However, since the simpler problem solved in 
Sect.|3|is ill-conditioned, it is expected that the more com- 
plex problem treated here will be even more ill-conditioned, 
all the more since we now seek two fields plus one extinction 
parameter. We will thus add a priori information by imple- 
menting smoothness constraints, and allow the unknowns to 
have different smoothing parameters. We define the penal- 
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Figure 17. (a) and (b): Free metallicity reconstructions of a mono-metallic population for SNR = 500 and SNR = 200 per pixel. For 
high SNR, a mono-metallic population is unambiguously recovered, while at lower SNR, a multi-metallic solution appears, indicating 
the degeneracy of the problem, (c) and (d): solution singular modes of the 2D age-metallicity reconstruction problem. The difficulty 
involved in such a reconstruction arises from the very bad conditioning number, and the lack of features of the first singular modes in 
the metallicity direction. 



izing function P smoo th by 

Psmooth(x, Z) = ^ X P(X) + ^zP(Z) , (43) 

where P is the standard quadratic function defined by 1291 . 
5.2.3 Metallicity bounds 

The metallicity range [Z m i n , Z max ] for which models are 
available is bounded. We must therefore find a way to en- 
sure that the solution lies in the desired metallicity range by 
making unwanted values of Z unattractive. To do this we 
use a binding function c (c stands for constraint) which is 
another kind of penalizing function. This technique was pro- 
posed by R. Lane (private communication 6 ). The function c 
must be fiat inside [Z m j n , Z max ] in order not to influence the 
metallicity search and increase gradually outside. We define 
c piecewise by 

((Z Z m i n ) 2 if Z Zmin , 
(Z - Z max ) 2 if Z > Z max , (44) 
else . 

6 http:/ /www.elec.canterbury.ac.nz/staff/Academic/rgl/rgl.htm 



The binding function C used in practice is defined by 

C7(Z) = 5>(^). (45) 

j 

The penalization function we finally adopt is 

P„(x, Z) = P sm0 oth (x, Z) + fx c C(Z) , (46) 

where a binding parameter nc allows to set the repulsiveness 
of the exterior of [zT m i n , Z max ]. The objective function, 

Q M = X 2 (s(x,Z,£)) + P M (x,Z), 

is now fully characterized. Its derivatives are given in the 
appendix 151 

5.3 Simulations of metal dependent LWSAD 

We applied the proposed inversion method to mock data 
for various stellar age distributions, age-metallicity relations, 
extinctions and SNR. In this case, choosing an input model 
involves choosing the functions A(t), Z(t), and a color excess 
E. The corresponding model spectrum is then computed 
following 1411 . Gaussian noise is added to obtain the pseudo- 
data. 
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Figure 1181 shows simulations of reconstructions in the 
case of high quality pseudo-data: R = 10 000 at 4000 - 6800 
A with SNR = 100 per pixel for 100 realizations. The left 
panels show the stellar age distribution while the right pan- 
els show the age-metallicity relations. The top row shows 
reconstructions of a double-burst population where the two 
bursts have different luminous contributions. The young 
component accounts for 75% of the light, and its metallicity 
is a tenth of the old component's, which contributes only to 
25% of the total light. The unbalance between the young and 
old luminous contributions should make it more difficult to 
constrain the old component. Still, the reconstructions are 
good in the sense that the bumps are properly centered and 
scaled. Metallicities are also adequately recovered. The re- 
constructed stellar age distributions are smoothed versions 
of the model, as expected. 

The bottom line plots illustrate the case of a continu- 
ous rather than bumpy stellar age distribution. All ages con- 
tribute equally to the light except the youngest and oldest 
ones. The model age-metallicity relation yields a metallic- 
ity Z{t) that increases with time. The rise and decay of the 
recovered age distribution are adequately located, and the 
metallicities have the correct trend. The metallicities of the 
youngest component are unconstrained simply because they 
do not contribute to the total light. 

For each realization of these simulations, the color ex- 
cess was a random number between and 1. In each case, 
it was recovered with an accuracy better than 10 -2 . 



5.4 Age separation of metal dependent LWSAD 

In a realistic observationnal setting, we would like to age- 
date superimposed populations. For such investigations, it is 
essential to have a good understanding of the limitations of 
the non-parametric method. We therefore investigated again 
how well we could reconstruct two superimposed bursts of 
unknown metallicities and extinction. We proceeded as in 
Sect. 14.21 and the grid of double burst ages is the same. Both 
bursts contribute equally to the total light. In a first set of 
experiments, the model age-metallicity relation is arbitrarily 
chosen as log(Z) = —9.95 + 0.85 log(age[yr]), where the age 
ranges from 50 Myr to 15 Gyr. It is not supposed to be a 
physically motivated choice, but allows us to explore about 
2 decades in metallicity. The allowed range for the solution 
age-metallicity relation is [Z m m = 0.0004, Z max = 0.05]. The 
extinction parameter was chosen randomly between and 
0.5. The reconstructions were performed without any a priori 
for the age-metallicity relation, stellar age distribution or ex- 
tinction parameter, apart from the requirement of smooth- 
ness. For each pseudo-observational context, the smooth- 
ing parameter was set using the GCV value for the mono- 
metallic case and fine-tuned for a small separation between 
2 bursts. The smoothing parameter for the age-metallicity 
relation was set to a large value (around 10 3 ) because we 
just wish to recover a global trend of the metallicity evolu- 
tion in the reconstruction. A flat guess for all variables was 
the starting point. In every case we converged to a stable 
solution in less than 1500 iterations, corresponding to w 1 
minute on a 1 Ghz pc for a R = 10000 basis (i.e. 14000 pixels 
of 0.2 A) with 60 age bins. The distribution of the reduced 
X 2 of the solutions were found to follow a Gaussian distri- 



bution law with unit mean, showing that each experiment 
had properly converged. 

We are thus able to give an estimate of the resolution 
in age versus SNR and spectral resolution. Figure HU1 shows 
some of the results of our simulation campaign. On each 
panel we plotted the results obtained at R = 2 500 (up- 
per octan) and R = 10 000 (lower octan). The results for 
R = 1 000 and R=6000 are very similar and are not shown. 
The number of successful inversions rises with increasing 
SNR, and the unseparable zone in the diagram shrinks. In 
the same way, the error bars and bias reduce with increasing 
SNR. We give the resolution in age for several SNR per A 
and spectral resolutions in Fig. I2UI It improves with increas- 
ing SNR, but settles around 0.8 dex for very high quality 
data. The variation of the resolution in age with spectral 
resolution is not significant compared to the statistical error 
(~ 0.25 dex), so that no trend with spectral resolution can 
clearly be deduced. The middle panel shows the median er- 
ror on the luminous weighted ages of the two bursts for the 
successful separations. The error decreases with increasing 
SNR down to 0.02 dex for SNR = 200 per A, and is signifi- 
cantly lower for the high resolution experiments (the relative 
statistical error for this measure is smaller than 5%). We see 
the same trend in the metallicities median errors of the dou- 
ble bursts, in the right panel. The smallest error is obtained 
for the highest spectral resolution. The general smallness of 
these errors is partly explained by the severity of the selec- 
tion, that rejects as non separable any ambiguous solution. 

Somewhat unexpectedly, the results do not depend on 
the slope of the age-metallicity relation adopted for the 
double-burst models. With a negative slope, a young metal- 
rich population is added to an old metal-poor one. In view of 
the age-metallicity degeneracy, this should be the least favor- 
able situation for a proper separation. We performed simu- 
lations with positive and negative slopes and obtained iden- 
tical results considering the statistical errors given above. 
Thus, the age-metallicity degeneracy is not a limiting factor 
in our experiment. 



6 CONCLUSIONS AND PROSPECTS 

Let us sum up our findings relative to the diagnosis of the 
linear (mono-metallic) problem (Sect. |3] and and the more 
realistic non linear problem of recovering simultaneously 
the luminosity weighted stellar age distribution, the extinc- 
tion and the age-metallicity relation (Sect. |5J in turn, and 
close on the observational and methodological prospects of 
STECMAP. 



6.1 Probing the linear problem: the tricks of the 
trade 

The idealized problem of recovering the non-parametric stel- 
lar age distribution of a mono-metallic population seen with- 
out extinction is linear. The conditioning number of the ker- 
nel is very large and accounts for the ill conditioning of the 
problem, i.e. pathological sensitivity to noise in the data. 

The noise in the single stellar population models also 
limits the number of free parameters that may be recovered 
robustly to describe the star formation history. In textbook 
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Figure 18. Reconstruction of the stellar age distribution (left) and age-metallicity relation (right) for R = 10 000 and SNR = 100 
per pixel. The thick line is the input model. The circles and the bars show respectively the median and the interquartiles of the 
recovered solutions for 100 realizations. The metallicities and flux fractions of the populations with significant contributions are adequately 
recovered. In each experiment, the extinction parameter of the model was chosen randomly and recovered with good accuracy. 



inversion problems, this number can be estimated quanti- 
tatively from the sequence of singular values of the single 
stellar population basis. Here however, this theoretical value 
is misleading because the expected signature of the model 
noise in the singular value spectrum is not apparent. We ex- 
plained this by the correlations between the noise patterns 
in subsequent basis spectra. To obtain the number of free 
parameters, the singular values are used together with an 
independent estimate of the SNR of the basis. For the opti- 
cal spectral range covered with Pegase-HR and ages ranging 
from 50 Myr to 15 Gyr, the corresponding number is 6. This 
makes high frequency variations of the stellar age distribu- 
tion unrecoverable, no matter the data quality, SNRd, and 
the inversion method. 

When the dominant error source is the data, the prob- 
lem may be regularized by truncating the singular value de- 
composition or reducing the number of age bins so that 
o"i/o" n ^ SNRd\A"-- This crude rule can be used to obtain 
a quick estimate of the performance expected for a given 
dataset. 

The problem can be more profitably regularized with- 



out reducing arbitrarily the number of age bins by imposing 
the smoothness of the solution, to obtain a penalized likeli- 
hood estimate. This constraint reduces the risk of overinter- 
preting the data. The smoothing parameter is set automat- 
ically by generalized cross validation for each SNRd, or/and 
by performing simulations in a suited pseudo-observational 
context. 

For an adequately regularized problem, we defined the 
inverse model matrix and inspected it in order to find the 
wavelength ranges which are most discriminative for age de- 
terminations. We found that the information is widely dis- 
tributed along the optical range (cf. Figs, Ibl and 1161 , 

The behaviour of the inversion can be predicted by 
inspecting the singular value decomposition or generalized 
singular value decomposition of the kernel. The first non- 
attenuated solution vectors are responsible for the detailed 
shape of the regularized reconstructions, and thus for the 
generation of artifacts. The general shape of the solution 
vectors, and especially the presence/absence and location of 
their oscillations, gives an indication in which age ranges the 
inversion behaves worst. 
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Figure 19. Same as Fig, ll2l but the mctallicities and the extinction are free parameters. The SNR is given per A. The ability to separate 
close sub-populations improves with SNR, as does the accuracy of the age estimates. 



In particular, the inspection of the SVD components 
revealed that the problem of recovering flux distributions 
was less pathological than the problem of recovering mass 
fractions. More specifically, the transition rank io between 
signal and noise dominated regimes is independent from the 
fiducial model in the recovery of flux fractions. 

Second or third order penalizations gave similarly good 
results, showing that the quality of the inversion does not 
rely strongly on the details of the regularization. 

Requiring the solutions to be positive improves the re- 
sults even further, and in particular reduces Gibbs ringing, 
as can be seen by comparing Fig. |H] and Fig. 1101 

One should be aware that the efficiency of the inverse 
method cannot be assessed on the basis of a small set of 
simulations. Indeed, it is easy to produce good looking re- 
sults down to SNRd = 0.1 per pixel by carefully choosing 
the model age distribution. 

We performed an extensive simulation campaign by in- 
verting a grid of double burst models in several pseudo- 
observational regimes. If the age difference between the 
bursts is larger than 0.4 dex, we were able to separate the 2 
components and recover their ages with a very small error 
from high quality data (SNR d = 200 per A). 

However, the high SNRd regime for which we obtained 
the best results are questionable. Indeed, when SNRd and 
SNRb are comparable, the number of degrees of freedom is 



imposed by the noise in the basis rather than in the data. 
We therefore consider the extreme regimes with SNRd ^ 200 
per A unphysical: small oddities (of uncertain nature) in 
the basis are seen as physically discriminative information. 
Only an improvement of SNRb could in principle increase 
the number of degrees of freedom. Assuming that the singu- 
lar values spectrum of the initial kernel shown in Fig. is 
representative of the basis even at higher SNRb, we can set 
the following rules of thumb. 

(i) If for example SNRd = 100 per pixel, the maximum 
number of freedom degrees one may consider is of order 8 
(n — 8 from criterion 1251 or Fig. 

(ii) To ensure that no serious contamination of the singu- 
lar values by noise in the basis happens for i < 8, one would 
need SNR b ^ 1 000 per pixel (estimated from Fig. 0J (2 500 
per A). We caution that this is an extrapolation, and that 
the actual behaviour of single stellar population spectra at 
this kind of SNR is not known. 

By comparing the solutions given by singular value de- 
composition and the Gram-Schmidt othonormalized kernel 
we showed that ill-conditioning remains an issue when work- 
ing with compressed data. 

Finally, the mismatch observed when a mono-metallic 
population is fitted by a basis of different metallicity allowed 
us to constrain this additional metallicity parameter with 
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Figure 20. Left: Resolution in age [dex] versus SNR per A for various spectral resolutions. As expected, the resolution in age improves 
with increasing SNR. It settles around 0.8 dex for the highest SNR. No significant trend is seen with spectral resolution. Middle: 
Median error on the age of the bursts [dex] in the successful separations vs SNR for several resolutions. High resolution experiments give 
the smallest errors. Right: Same as middle panel but for metallicity estimates. Again, the best accurracy is obtained at high spectral 
resolution, given the same total number of photons. 



a SNRd as small as 10 per pixel, well enough to motivate 
a feasibility study of the recovery of the age distribution, 
the metallicities and the reddening of a composite stellar 
population. 



6.2 Beyond the mono-metallic inversion? 

The ill-conditioned problem of recovering a 2D age- 
metallicity distribution of a composite unreddened popu- 
lation can also be recast into a linear problem. A penalized 
likelihood estimate can be obtained by means of additionnal 
smoothness constraints. The inspection of the regularized in- 
verse model matrix reveals that a large number of age and 
metallicity sensitive lines carrying discriminative informa- 
tion are located all along the optical range. The shape of 
the first solution singular modes shows that age-metallicity 
degenerate solutions are expected even for SNRd as large 
as 200 per pixel. Notwithstanding the above caveat about 
high SNR, the inversions with such a complexity are thus 
unfeasible in realistic regimes from optical integrated light 
only. 

A natural simplification involves assuming that the 
metallicity of the population can be described by a one to 
one non parametric age-metallicity relation. The problem 
of recovering the stellar age distribution, the age-metallicity 
relation and an extiction parameter then becomes tractable 
provided that adequate regularization (smoothness, bound 
and positivity) is implemented, and yields a penalized like- 
lihood estimate. 

A detailed simulation campaign allowed us to estimate 
the resolution in age that can be achieved from optical data 
in several pseudo-observational regimes. If the time elapsed 
between 2 instantaneous bursts is larger than 0.8 dex, they 
can be separated unambiguously by STECMAP from high 
quality data (SNRd = 100 per A), and their ages and metal- 
licities can be constrained with an accuracy of respectively 
0.02 dex and 0.04 dex. In such regimes, the age-metallicity 
degeneracy is effectively broken. For smaller separation, 
there is always a mono-burst or smoother solution that fits 
the data equally well. Our experiments reveal no clear de- 



pendency of the resolution in age on the spectral resolution 
R (J? 1 000) as long as the SNR per A (or integration time) 
is conserved in the comparative experiments. As in the pre- 
liminary conclusion for the idealized mono-metallic unred- 
dened problem, it is not clear whether the extreme SNRd 
are physical or not, since in these regimes the noise in the 
basis is not negligible any more compared to the noise in the 
data. In any case, 0.8 dex should be considered as a lower 
resolution limit, for any separation attempt in the range 
A A = [4 000 A, 6 800 A]. 

The fact that free extinction does not hinder the in- 
versions indicates that the continuum is not a critical con- 
straint. Simulations with more complex corrections on the 
continuum (not described in this paper) confirm this point. 
The information on age and metallicities is carried in the 
line spectrum. 



6.3 Discussion and prospective 

Perhaps the most intriguing conclusions of this paper are the 
small number of degrees of freedom found in an optical single 
stellar population basis even with SNRb as large as Pegase- 
HR, and the very anti-intuitive hint that significantly larger 
SNR is needed in the basis than in the data to be analyzed. 
It highlights the need to study and quantify the influence 
of the models noise in linear and non-linear inversions, and 
to continue and improve the various steps involved in the 
construction of the model. 

Several directions can be followed, on the basis of 
Sect. 12.31 Empirical libraries should improve with the com- 
bination of large collecting areas, and high resolution, large 
coverage instruments with massive multi object capacities, 
which should boost up the construction of libraries by a 
signif icant factor. The library UVES POP feaenulo et alJ 
2003) is an example. With telescopes of the 10m class or 
larger, stars in clusters and in Local Group galaxies can 
be observed to remedy in part the issue of completeness and 
some of the biases of solar neighbourhood libraries (e.g. more 
luminous metal-poor stars, or stars with modified ct-elenient 
abundances) . 
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On the theoretical side, one should investigate accu- 
rately and systematically what drives the shape of the sin- 
gular value spectrum of the SSP basis. In this paper we con- 
centrated on a given SSP model, without tuning the basis to 
study the effect of e.g. sampling strategies on the condition- 
ing. Since the behaviour of an inverse problem depends on 
the shape of the solution singular vectors as well, it is a key 
issue to understand what drives their shape. Making them 
smoother and more regular is a step towards reducing the 
generation of artifacts. Clearly, one would want to question 
the sampling strategy in (T, g, Z) space in terms of both the 
conditioning number of B and the roughness of its singular 
vectors. In particular, one would like for instance to apply 
an error weighted regularized tomographic interpolation in 
(T, g, Z) space, in order to construct a noise free spectral ba- 
sis, which would by construction prevent from interpolating 
the noise from one spectrum to another. Even though the 
interpolation of the noise patterns of individual stars in the 
library may explain the vanishing of the saturation of the 
singular values, we still miss a quantitative relation between 
the density of library stars in (T, g, Z) space, their SNR, and 
the slope of the singular value spectrum. 

Ultimately one should aim at designing inverse methods 
where the errors in the models are explicitly taken into ac- 
count (for instance using TLS) in order to draw a consistent 
error budget. 

The generally very limited separability of successive star 
formation episodes in most pseudo-observational settings is 
in strong contrast with the results of a number of more opti- 
mistic authors. In particular, if one is bound to draw cosmo- 
logical constraints from the stacking of a large set of noisy 
star formation histories, it is still essential to check that in- 
dividual star formation histories are well recovered, since 
otherwise the median solution is likely to be dominated by 
artifacts. Exhaustive testing of the method as we propose is 
in this case a mandatory step. 

The spectral energy distribution matching procedures 
and parameter recovery presented here are absolutely not 
model-dependent and could be used in association with any 
other stellar population model as is 7 . It will thus be interest- 
ing and informative to perform the same kind of study (reso- 
lution in age, conditioning) with other existing evolutionary 
synthesis models, in order to quantify the amount of infor- 
mation and the constraints to be expected from observations 
in other wavelength domains, as the UV, NIR or FIR. It is 
expected that increasing the wavelength coverage should im- 
prove significantly the resolution in age and the behaviour of 
the problem in general. The possible discrepancies between 
the models are also a major matter of concern. For instance, 
are the metallicity constraints using a given set of single 
stellar populations robust to a change of the evolutionary 
synthesis code? It will be interesting to test this by produc- 
ing mock data with one available co de feruzual fc Charlod 
120031 : lOonzalez Deleado et aljl2005h and interpreting them 
with another one. We expect misfits to arise from wavelength 
calibration error, small scale flux calibration errors, and sys- 
tematic deviations caused by the use of different evolutive 
tracks, IMFs, and stellar libraries. This exercise will allow us 



We are preparing a public release of the inversion codes 



to investigate the amount of error introduced by the models 
themselves. 

The methods we described together with the corre- 
sponding error and separability analysis will be very use- 
ful for interpreting large sets of data from large surveys 
such as SDSS, 2DFGRS, DEEP2, • • • , and also for upcom- 
ing new generation instruments, especially high resolution 
instruments with multi- object or field integra l capacities, 
for instance FALCO N JPuech fc Savedell2004D . or MUSE 
iHenault et all2003ft . In this context, astronomers will want 
to extract kinematical information as well, and question the 
relationship between the kinematics and the nature of the 
stellar populations. The simultaneous recovery of the kine- 
matical distribution and the corresponding stellar popula- 
tion via the non-parametric interpretation of spectra is de- 
scribed in a companion paper. 
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APPENDIX A: DEPENDENCE OF THE 
SIGNAL-NOISE TRANSITION ON THE 
FIDUCIAL MODEL 

In this section we clarify the relation between the transi- 
tion rank io between the noise and signal dominated regimes 
(the intersection of the ■ y with the uj ■ e) and the fidu- 
cial model, as defined in Sect. I3T41 and Fig. |2] More specif- 
ically, we explore the shift of the transition by varying the 
age of the fiducial model, for a flux-normalized and a mass- 
normalized basis. The results are shown in Fig. lAll The fidu- 
cial models are given in the bottom of each column. Note 
that the y axis is labeled "flux fractions" on the left and 
"mass fractions" on the right. This is to recall that the in- 
terpretation of the model curve differs, depending on the 
adopted normalization of the basis. Compared to Fig. we 
added a 3rd order polynomial fit to the signal singular coeffi- 
cients and a constant fit to the noise coefficients. This allows 
to detect automatically and objectively the transition rank 
io, as the intersection of the two fits. 

For the mass-normalized basis, the transition moves 
from the 5-th rank (for the youngest fiducial model) up to 
the 20-th (for the oldest fiducial model). On the other hand, 
the location of the transition for the flux-normalized basis 
is rather unaffected by changes of the fiducial model and 
remains around rank 7-9. 



APPENDIX B: GRADIENTS OF Q M 

The direct linear solution which minimizes the objective 
function Q M can only be used in the case of a linear 
model (with respect to the parameters) and without con- 
straints (such as positivity). For all other cases, the ob- 
jective function Q M can only be minimized by means of 
an iterative method. The most efficient and yet simple to 
use of these methods require the computation of the objec- 
tive function and of its gradient. These optimization meth- 
ods are: the conjugate gradients and variable metric meth- 
ods (e.g. BFGS). In practice for non- linear problems, vari- 
able metric methods have been found to require less itera- 
tions and less functio n evaluations than conjugate gradient 
ones l)Thiebaudl2002r i . For that reason, we used the limited 
memory variable metric method VMLM-B implemented in 
the OptimPack package written by E. Thiebaut for Yorick 
( http:/ /www. maumae.net/yorick/doc/index. html |. 

Since the efficiency of these iterative optimization algo- 
rithm rely on the correctness of the gradient of (i.e. par- 
tial derivatives of Q M with respect to the free parameters), 
we devote this appendix to the derivation of such partial 
derivatives for the different cases considered in this paper. 
Whenever it was possible (i.e. in the linear case), the iter- 
ative solutions were tested against the analytical solutions, 
and were found to be identical down to machine precision. 
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Figure Al. Study of the location of the signal-noise transition rank as a function of the fiducial model. The figures are the same 
as Fig. |2l with the same pseudo-observational setting (SNR=100 per pixel), for a flux-normalized (top) and a mass-normalized basis 
(middle) respectively, for 3 different fiducial models x, given at the bottom of each column. Polynomial fits are given for the signal and 
noise singular coefficients. The transition rank in is given in each figure as the intersection of these fits. For the mass-normalized basis, 
the rank of the transition between signal and noise dominated regimes spans a wide range of values depending on the fiducial model 
x. On the contrary, for the flux-normalized basis, the transition rank is rather constant with regard to modifications of the age of the 
fiducial model. 



Bl Simple Linear Model 

In the linear problem, the gradients of Q M have simple ex- 
pressions: 

2j£ = -2B T -W-(y-B-x) , (Bl) 
|§ = 2L T -L-x. (B2) 



B2 Age-Metallicity-Extinction Gradients 



For the resolution of the age-metallicity-extinction problem 
(Sect.[3J, the objective function Q M is a \ 2 penalized by reg- 
ularization terms and a binding function. The regularization 
terms being the same as in the linear case, their gradients 
are given by Eq. 1B21 . The gradient of the binding function 
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C for a metallicity vector Z reads: 



APPENDIX D: GSO VERSUS SVD 



dC 



2 (Zj — Z min ) for Zj < -Zmin , 



— I = <^ 2{Zj- U) for Zj > Z max , (B3) 
' i I else. 



In order to derive the gradients of the x 2 term for more 
complex (non-linear) models, it is useful to rewrite this term 
as: 



X 2 =r T -W-r, 



(B4) 



where, for sake of simplicity, we introduced the vector of 
residuals r defined, in this case, by: 



r = y - diag(f cxt )-B-x . 



(B5) 



Then the derivative of the x term with respect to any free 
parameter, say a, writes: 

d X 2 _ dr~ 



2— Wr. 

oa oa 



(B6) 



Considering the different type of free parameters, we obtain: 



-7^- = -2B T -diag(f cxt )- W-r, 
-QZ = -2x ■ diag(fext) ■ W- r, 

— = -2x B -diagf W l-W-r. 



(B7) 
(B8) 
(B9) 



In the above expressions, <9B/<9Z is derived directly from 
the single stellar population basis B(X,t,Z): 



dB\ 4 f dB(\,t,Z) 



(BIO) 



Similarly, the term 9f ox t / 3-E derives directly from the chosen 
extinction law f ext (E,X): 



/gfcx t \ ^ / gjextCg, A )' 



(Bll) 



APPENDIX C: GENERALIZED SINGULAR 
VALUE DECOMPOSITION 

This section introduces briefly the Generalized singular 
value decomposition which is used in the main text to un- 
derstand how regularization damps smoothly the singular 
vectors according to the SNR. In short, the generalized sin- 
gular value decomposition of (B,L) is defined by: 



In the main text, we claim that Gram-Schmidt othonormal- 
ization amounts to singular value decomposition in the lin- 
ear regime (mono-metallic and extinction-less populations) 
in the absence of truncation. Let us demonstrate and discuss 
this briefly. 

In the mono-metallic extinctionless case, we can expand 
the kernel B as: 



B = O £ V. 



(Dl) 



where O is the Gram-Schmidt orthonormalized kernel ob- 
tained from B, and £ = diag(<ri, . . . , u n ) is a diagonal ma- 
trix such that £ ■ V = O t B is the passage matrix from the 
initial coordinates of the kernel B to the orthonormalized 
basis. In this sense, the <Ji are the norms of the vectors of 
the passage matrix. It is interesting to compare this expan- 
sion with the SVD: the kernel O is orthonormal and the 
matrix £ is diagonal, but the matrix V is not orthogonal. 

Thus, the expansion of Eq. 1D11 is not exactly identical 
to that corresponding to singular value decomposition. Still, 
as long as none of the <7; is zero, the matrix V is inversible, 
and as for the singular value decomposition, we can write 
the solution x as 

x = V- 1 ■ E- 1 ■ O t ■ y = £ (v- 1 ), , (D2) 

where y = B-x is the data, and the (v -1 ); are the columns of 
V -1 . We will in this section abusively call the Oi the singu- 
lar values, the Oi and (v _1 )i respectively the data singular 
vectors and the solution singular vectors. The solution x is 
the sum of the singular coefficients O^ • b (the "compressed 
datum" proposed by MOPED 's authors) divided by the sin- 
gular values &i times the solution singular vector (v -1 )^. 
The left panel of Fig. ID II shows the singular values of the 
SVD and the GSO expansion of the kernel. Their very sim- 
ilar decay indicates similar behaviour of the inverse prob- 
lem. The right panel of Fig. IDll shows for a moderately ill- 
conditioned example (R = 10 000, SNRd = 100, 10 age bins, 
solar metallicity, ai/aio = 2 y^SNRd) the solutions found 
by applying Eq. 1D2II and Eq. 1221 corresponding to the two 
expansions. As expected from the conditioning number and 
SNRd, both are fairly noisy, but the important point is that 
they are actually equal down to machine precision. Thus, 
even though there is a slight formulation difference between 
these two expansions, they practically give the same solu- 
tions. 



B = U ■ £ ■ V 1 , and L = Q V 1 



(CI) 



where U and Q are both orthogonal. The matrix V is non- 
singular and its columns Vj are B T -B and L T -L orthonor- 
mal i.e. V T -B T -B-V = £ 2 and V T -L T -L-V = © 2 . The 
matrices £ and © are diagonal: £ = diag(<7i, 02, ■ ■ ■ , cr n ) 
and = diag(#i, 62, ■ ■ ■ , 6 n ), with the <j j in increasing order 
and the 9i decreasing. See lHansenl <ll994f) for a more detailed 
description of generalized singular value decomposition and 
its properties. 
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Figure Dl. Left: Singular values of the Gram-Schmidt othonormalization and the singular value decomposition of the kernel. Both decays 
are characteristic of an ill-conditioned problem. Right: Solutions found using the Gram-Schmidt othonormalization and the singular value 
decomposition (slightly offset for clarity) for simulated data with SNR = 100 per pixel, R = 10 000. They are identical down to machine 
precision, showing the similarity between both formulations. 



